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FOREWORD 


Anong  the  responsibilities  assigned  to  the  Office  nf  the  Manager,  National 
Communications  System,  is  the  management  of  the  Federal  TeTecnrmunication 
Standards  Program.  Under  this  program,  the  NCS,  with  the  assistance  nf  the 
Federal  Telecommunication  Standards  Committee  identifies,  develops,  and 
coordinates  proposed  Federal  Standards  which  either  contribute  tn  the  inter¬ 
operability  of  functionally  similar  Federal  telecommunication  systems  or  tn  the 
achievement  of  a  compatible  and  efficient  interface  between  computer  and 
telecommunication  systems.  In  developing  and  conrdinatinn  these  standards,  a 
considerable  amount  nf  effort  is  expended  in  initiating  and  nursuinn  joint 
standards  development  efforts  with  appropriate  technical  committees  nf  the 
Electronic  Industries  Association,  the  American  National  Standards  Institute, 
the  International  Organization  for  Standardization,  and  the  International 
Telegraph  and  Telephone  Consultative  Committee  nf  the  International 
Telecommunication  Union.  This  Technical  Information  Rulletin  nresents  an 
overview  of  an  effort  which  is  contribution  to  the  development  of  compatible 
Federal,  national,  and  international  standards  in  the  area  of  facsimile 
standards.  It  has  been  prepared  to  inform  interested  Federal  activities  nf  the 
progress  of  these  efforts.  Any  comments,  inputs  or  statements  of  requirements 
which  could  assist  in  the  advancement  of  this  work  are  welcome  and  should  be 
addressed  to: 


Office  of  the  Manager 
National  Communications  System 
ATTN:  NCS-TS 
Washington,  DC  20305 
(202)  692-2124 
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1.0  INTRODUCTION 


This  document  summarizes  the  work  performed  by  Delta 
Information  Systems,  Inc.  for  the  Office  of  Technology  and 
Standards  of  the  National  Communications  System,  an  organization 
of  the  U.S.  Government,  under  Task  1  of  Contract  number 
DCA100-83-C-0047. 

The  purpose  of  the  task  is  to  simulate  and  evaluate  the 
pattern  recognition  algorithm  proposed  by  AT&T  for  Group  4 
facsimile.  The  Office  of  Technology  and  Standards,  headed  by 
National  Communications  System  Assistant  Manager  Marshall  L. 

Cain,  is  responsible  for  the  management  of  the  Federal 
Telecommunications  Standards  Program,  which  develops 
telecommunications  standards  whose  use  is  mandatory  by  all  federal 
agencies . 

The  CCITT  is  actively  working  toward  the  standardization  of 
Group  4  facsimile.  One  of  the  key  elements  of  this  CCITT 
recommendation  shall  be  the  compression  algorithm  for  encoding  the 
transmitted  facsimile  signal.  A  preliminary  standard  for  this 
coding  technique  has  been  established  consisting  of  an  extension 
of  the  Group  3  Modified  READ  Code  (MRC).  Several  investigators 
have  studied  a  class  of  more  advanced  compression  techniques  which 
recognize  recurring  patterns  (such  as  textual  characters)  and 
transmit  a  short  ASCII-like  code  to  represent  such  a  symbol.  The 
compression  for  this  type  of  coding  algorithm  exceeds  that  for  the 
basic  Group  4  algorithm  by  a  significant  degree.  Since  these 
compression  techniques  normally  require  an  error  free  environment, 


which  is  available  in  Group  4  Facsimile,  AT&T  has  submitted  a 
specific  proposal  (Appendix  B)  to  the  CCITT  for  the  design  of  suci. 
a  pattern  recognition  coding  technique, 

1.1  Algorithm  Overview 

The  Pattern  Recognition  Algorithm  processes  the 
facsimile  image  by  extracting  patterns  from  the  image  and 
attempting  to  recognize  them.  The  input  image  is  examined 
line  by  line.  When  a  black  pel  is  found  an  attempt  to  isolate 
the  pattern  to  which  it  belongs  is  made.  If  the  pattern  can 
not  be  isolated  within  a  window  (32x32  bits  for  200 
lines/inch)  a  piece  of  the  pattern  is  extracted  from  the 
image.  Isolated  patterns  are  then  compared  with  already 
identified  patterns  which  are  stored  in  a  library.  If  a  match 
is  found  the  position  of  the  pattern  in  the  image  and  its 
location  in  the  library  are  coded.  If  no  match  is  found  the 
pattern  is  added  to  the  library  and  the  bit  image  of  the 
pattern  and  its  position  are  coded.  The  primary  criterion  for 
a  no  match  decision  is  an  error  pel  with  an  error  weight  of 
four  or  more  (see  Appendix  B  Section  3.3.a).  For  the  purposes 
of  this  evaluation  all  images  were  processed  with  a  reject 
threshold  of  four  and  also  a  reject  threshold  of  three.  For  a 
complete  description  of  the  pattern  recognition  coding 
algorithm  see  Appendix  B. 
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2.0  TASK  1.0 


COMPUTER  SOFTWARE 


2 . 1  Overview 

In  order  to  evaluate  the  Group  4  Pattern  Recognition 
Algorithm,  two  software  programs , G4RECENCODE  &  G4RECDECODE ,  were 
written  to  simulate  the  facsimile  encode  and  decode  functions 
described  in  the  AT&T  proposal  to  the  CCITT  dated  November,  1984. 
The  software  was  written  in  Fortran  X3. 9-1978  with  MIL-STD-1758 
Extensions.  The  G4RECENC0DE  program  processes  the  input  document 
image  creating  a  coded  output  file  and  a  library  pattern  file  (See 
Figure  2.1).  The  G4RECDEC0DE  program  then  processes  the  coded 
output  file  recreating  from  it  the  document  image  and  a  library 
pattern  file  (See  Figure  2.1).  Along  with  these  files,  a  log  or 
statistics  file  was  generated  which  contained  the  following 
information . 

1)  Overall  Image  Statistics 

Total  Coded  Bits  Per  Image 
Compression  Ratio 

Average  Coded  Bits  Per  Image  Line 

2)  Components  of  the  Total  Coded  Bits  Per  Image 

Mode  Bits 

Horizontal  Position  Bits 
No  More  Pattern  Bits 
Vertical  Displacement  Bits 
Library  ID  Bits 
Library  Pattern  Size  Bits 
Coded  Library  Pattern  Bits 

3)  Pattern  Recognition  Statistics  Per  Image 
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LIBRARY 

PATTERN 

FILE 


Number  of  Patterns 

Number  of  Recognized  Pattern 

Number  of  Library  Matches  Per  Incoming  Pattern 
2 . 2  Resol ution  Dependent  Program  Variations 

Since  the  images  to  be  processed  were  at  four  different 
resolutions,  the  G4RECENC0DE  &  G4RECDEC0DE  programs  were  written 
to  accomodate  three  different  pattern  window  sizes,  32  bits,  48 
bits,  and  64  bits.  It  was  also  necessary  to  increase  the  number 
of  horizontal  position  bits  and  library  pattern  size  bits  by  one 
to  accomodate  resolutions  of  240  and  greater.  The  L-Pattern 
vertical  count  was  determined  by  dividing  the  pattern  window  size 
by  three  thereby  giving  counts  of  10,  16,  and  21  for  32,  48  and  64 
bit  windows  respectively.  Noise  bit  removal  was  also  dependent  on 
resolution  and  was  determined  as  follows: 

1)  If  the  window  is  32  bits,  any  isolated  symbol  whose 
height  and  width  was  two  or  less  and  whose  bit  population 
was  two  or  less  was  removed. 

2)  If  the  window  is  48  bits  any  isolated  symbol  whose 
height  and  width  was  three  or  less  and  whose  bit 
population  was  four  or  less  was  removed. 

3)  If  the  window  is  64  bits  any  isolated  symbol  whose 
height  and  width  was  four  or  less  and  whose  bit 

population  was  eight  or  less  was  removed. 

The  feature  differences  used  in  the  library  screening 
procedure  were  held  constant  across  all  resolution  for  the  initial 
24  computer  runs  and  then  doubled  for  three  additional  runs  of  the 
400  bit  images  for  the  three  CCITT  documents  selected. 


3 . C  TASK  2.0  -  COMPUTER  SIMULATION 

Twenty-four  computer  simulation  runs  were  done  which 
consisted  on  the  following: 

CCITT  Test  Pages  1,  5,  7 

Resolutions  (Lines/inch)  -  200,  240,  300,  400 
Rejection  thresholds  -  3,  4 

In  addition  three  computer  simulation  runs  were  done  with  the 
feature  differences  increased  as  stated  in  Section  2.2.  These 
consisted  of  : 

CCITT  Test  Pages  1,  5,  7 
Resolution  -  400 
Rejection  Threshold  -  4 

Tables  3.1  through  3.3  shows  the  results  of  the  first  24 
computer  runs.  Table  3.1  lists  the  overall  image  compression 
statistics.  Table  3.2  lists  all  the  components  of  the  coded 
output  bits.  Table  3.3  lists  the  information  generated  by  the 
pattern  recognition  algorithm  pertaining  to  the  number  of  patterns 
per  image,  number  of  recognized  pattern  per  image  and  number  of 
library  matches  per  incoming  pattern.  Tables  3.4  through  3.6  show 
the  corresponding  results  for  the  three  additional  computer  runs. 

As  can  been  seen  in  Table  3.1  compression  ratios  approximated 
doubled  as  the  resolution  went  from  200  to  400  lines  per  inch. 
There  was  also  a  decrease  in  compression  ratio  when  the  reject 
threshold  was  decreased  to  three.  This  decrease  ranged  from  4.4% 
to  14.8%  on  CCITT  Document  ?1,  7.6%  to  16.3%  on  CCITT  Document  #5 
and  16.3%  to  19.5%  on  CCITT  Document  41. 
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IMAGE  COMPRESSION 


CCITT 

IMAGE 

RESOLU¬ 

TION 

REJECTION 

THRESHOLD 

TOTAL 

INPUT 

BITS 

200 

4 

4036608 

240 

4 

5736448 

#1 

300 

4 

8960000 

ENGLISH 

400 

4 

16146432 

LETTER 

200 

3 

4036608 

240 

3 

5736448 

300 

3 

8960000 

400 

3 

16146432 

200 

4 

4036608 

240 

4 

5736448 

#5 

300 

4 

8960000 

FRENCH 

400 

4 

16146432 

JOUR¬ 

200 

3 

4036608 

NAL 

240 

3 

5736448 

300 

3 

8960000 

400 

3 

16146432 

200 

4 

4036608 

240 

4 

5736448 

#7 

300 

4 

8960000 

KANJI 

400 

4 

16146432 

200 

3 

4036608 

240 

3 

5736448 

300 

3 

8960000 

400 

3 

16146432 

STATISTICS  TABLE  3.1 


CODED 

OUTPUT 

BITS 

COMPRES¬ 

SION 

RATIO 

AVG  CODED 
BITS/IMAGE 
LINE 

45756 

88.22 

19.59 

52695 

108.86 

18.81 

62878 

142.50 

17.97 

87893 

183.71 

18.81 

47853 

84.35 

20.49 

56081 

102.29 

20.02 

69675 

128.60 

19.91 

103173 

156.50 

22.08 

74287 

54.38 

31.80 

86245 

66.51 

30.79 

104964 

85.36 

29.99 

144636 

111.64 

30.95 

80314 

50.26 

34.38 

94429 

60.75 

33.71 

118587 

75.56 

33.88 

172749 

93.47 

36.97 

212162 

19.02 

90.82 

250532 

22.90 

89.44 

287922 

31.12 

82.26 

392732 

41.11 

84.06 

253376 

15.93 

108.47 

300721 

19.08 

107.36 

349833 

25.61 

99.95 

487744 

33 .10 

104.40 
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CODED  OUTPUT  HITS  TAIILE 


PATTERN  RECOGNITION  STATISTICS  TABLE  3.3 


#  OF 

#  OF 

#  OF 

CCITT 

REJECTION 

PATTERNS/ 

RECOGNIZE 

LIBRARY 

IMAGE 

RESOLUTION 

THRESHOLD 

IMAGE 

PATTERNS 

MATCHES /PAT 

200 

4 

1117 

944 

3.01 

240 

4 

1079 

918 

2.48 

300 

4 

1117 

922 

2.06 

#1 

400 

4 

1136 

896 

1.81 

ENGLISH 

200 

3 

1117 

925 

3.47 

LETTER 

240 

3 

1079 

890 

2.96 

300 

3 

1117 

884 

2.67 

400 

3 

1136 

830 

2.78 

200 

4 

2468 

2225 

2.56 

240 

4 

2376 

2101 

2.21 

300 

4 

2484 

2152 

2.13 

#5 

400 

4 

2499 

2055 

2.17 

FRENCH 

200 

3 

2468 

2167 

3.11 

JOURNAL 

240 

3 

2376 

2038 

2.87 

300 

3 

2484 

2056 

2.95 

400 

3 

2499 

1910 

3.47 

200 

4 

3401 

2804 

5.62 

240 

4 

3775 

3114 

3.43 

300 

4 

3648 

3000 

2.45 

#7 

400 

4 

3871 

3090 

1.82 

KANJI 

200  ' 

3 

3401 

2626 

7.45 

IMAGE  COMPRESSION  STATISTICS  TABLE  3.4 


TOTAL 

CODED 

AVG.  CODED 

CCITT 

REJECTION 

INPUT 

OUTPUT 

COMPRESSION 

BITS/ 

IMAGE 

RESOLUTION 

THRESHOLD 

BITS 

BITS 

RATIO 

IMAGE  LINE 

#1 

ENGLIGH 

LETTER 

400 

4 

16146432 

87846 

183.80 

18.80 

#5 

FRENCH 

JOURNAL 

400 

4 

16146432 

144030 

113.10 

30.83 

#7 

KANJI 

400 

4 

16146432 

387990 

41.62 

83.05 

CODED  OUTPUT  BITS  TABLE 
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PATTERN  RECOGNITION  STATISTICS  TABLE  3.6 


#  OF 

#  OF 

#  OF 

CCITT 

REJECTION 

PATTERNS/ 

RECOGNIZED 

LIBRARY 

IMAGE 

RESOLUTION 

THRESHOLD 

IMAGE 

PATTERNS 

MATCHES /PAT 

#1 

ENGLISH 

LETTER 

400 

4 

1136 

896 

4.18 

#5 

FRENCH 

JOURNAL 

400 

4 

2499 

2057 

4.48 

#7 

KANJI 

400 

4 

3871 

3098 

4.90 

From  Table  3.2  it  can  be  seen  that  the  coded  library  pattern 
bit  are  the  major  part  of  the  coded  output  bits,  making  up 
anywhere  from  35%  to  50%  of  the  coded  output  bits. 

Table  3.3  lists  pattern  recognition  information  from  each  of 
the  24  computer  runs.  It  should  be  noted  that  on  all  eight  runs 
of  the  CCITT  Document  #7  the  library  pattern  file  limit  of  512 
patterns  was  exceeded.  If  the  library  pattern  file  were  increased 
in  size  in  all  probability  the  compression  ratios  would  improve 
for  this  document. 

Figures  3.1  and  3.2  are  graphs  of  the  data  in  Table  3.1  for 
coded  bits/page  and  the  compression  ratio  respectively. 

Reviewing  the  results  of  the  additional  three  runs  of  the  400 
lines  per  inch  image  file  with  the  loosened  feature  requirements, 
there  was  only  a  very  slight  increase  in  compression  ratios. 
Comparing  the  corresponding  lines  in  tables  3.3  and  3.6  it  is 
interesting  to  note  that  although  the  number  of  compares  per 
incoming  pattern  at  least  doubled,  the  number  of  recognized 
patterns  changed  very  little  if  at  all. 


3-10 


4.0  TASK  3.0 


IMAGE  EVALUATION 


After  processing  the  coded  image  files  by  the  G4RECDECODE 
program,  the  resulting  output  image  file  for  each  image  was 
printed.  Each  output  image  was  visually  compared  against  the 
original  input  image.  There  were  on  the  lowest  resolution  images, 
200  lines  to  the  inch,  and  a  reject  threshold  of  four,  some  areas 
where  one  symbol  was  replaced  by  a  similar  symbol.  All  of  these 
substitutions  occurred  on  punctuation,  lower  case  alpha 
characters,  or  small  size  upper  case  characters.  See  Figures  4.1 
-  4.9.  Figures  4.10  and  4.11  are  Bit  Image  Printouts  of  part  of 
the  small  characters  at  the  botton  of  the  English  Document  showing 
a  lower  case  e/s  substitution.  On  all  three  CCITT  Documents  these 
substitutions  were  reduced  when  the  same  image  was  processed  at  a 
reject  threshold  of  three.  On  CCITT  Document  #1  (English  Letter) 
and  CCITT  Document  #7  (Kanji)  there  were  no  apparent  substitutions 
on  resolutions  240,  300  and  400  line  per  inch.  On  CCITT  Document 
#5  there  were  some  small  symbol  substitutions  in  the  drawings  at 
240  lines  per  inch  at  a  reject  threshold  of  four  but  almost  all  of 
these  disappeared  at  a  reject  threshold  of  three.  (See  Figures 
4.12  and  4.13.  ) 

Another  area  of  difference  was  found  on  the  CCITT  Document  #5 
(French  Journal)  where  vertical  and  horizontal  line  segments  would 
appear  ragged  and  varied  in  stroke  thickness.  This  is  present  in 
the  output  image  of  all  resolutions  but  is  much  less  pronounced  at 
the  higher  resolutions.  (See  Figures  4.14  and  4.15.) 
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THE  SLEREXE  COMPANY  LIMITED 

SAPORS  LANE  .  BOOLE  .  DORSET  -  BH  25  I  ER 
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Our  Ref.  350/PJC/EAC  X8th  January,  1972. 


Dr.  P.N.  Cundall, 
Mining  Surveys  Ltd., 
Holroyd  Road, 
Reading, 

Berks . 


Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsimile  a  photocell  is  caused  to  perform  a  raster  seen  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  aignal. 
This  signal  is  used  to  modulate  a  carrier,  vhieh  is  transmitted  to  a 
remote  destination  over  a  radio  or  cable  communications  link. 

At  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  which  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  scan  synchronised 
with  that  at  the  transmitting  terminal.  As  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  hsve  uses  for  this  facility  in  your  organisation. 

Tours  sincerely. 


AU. 


P.J.  CROSS 

Group  Leader  -  Facsimile  Research 
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Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsimile  a  photocell  is  caused  to  perform  a  raster  scan  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  signal. 
This  signal  is  used  to  modulate  a  carrier,  which  is  transmitted  to  a 
remote  destination  over  a  radio  or  cable  conmunications  link. 

At  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  which  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  scan  synchronised 
with  that  at  the  transmitting  terminal.  Aa  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  have  uses  for  this  facility  in  your  organisation. 

Yours  sincerely, 

fJJ.. 

P.J.  CROSS 

Group  Leader  -  Facsimile  Research 
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Dr.  P.N.  Cundall. 
Mining  Surveys  Ltd., 
Holroyd  Road, 
Reading, 

Berks  . 


Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsimile  a  photocell  is  caused  to  perform  a  raster  scan  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  signal. 
This  signal  is  used  to  modulate  a  carrier,  vhich  is  transmitted  to  a 
remote  destination  over  a  radio  or  cable  communications  link. 

At  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  vhich  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  scan  synchronised 
with  that  at  the  transmitting  terminal.  As  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  have  uses  for  this  facility  in  your  or ganisation. 

Tours  sincerely, 

fJd. 

P.J.  CROSS 

Group  Leader  -  Facsimile  Research 
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Celt  est  d'arnan:  plus  valable  que  7"  A /  est  plus 
grand.  A  cet  igard  la  figure  1  repriser.tc  la  vraie  courbe 
don  nan  I  $tj  )  en  fooction  de/pour  lej  valeurs  numi- 
riquea  .ijiqueet  page  prRcedrnte 


Dins  c*  cat.  k  filtre  adapii  pourri  etre  constitui, 
corformemen:  i  la  figure  3.  par  la  cascade  : 

—  d'jn  firre  pisse-bande  de  tranjf ert  umti  pour 
/a  '  4  'j'-if  cl  de  transfer!  quail  nu!  pour 
/  <  /o  e:  /  >  /0  -  A /,  filire  ne  modifia.M  pas  la  phase 


drs  composams 

k  iraversant 

[ 

j 

{ - T 

-  L 

1  ~ 

l  *  « 

1 _ 

Fie.  } 

—  filtre  auivi  d’urse  ligne  i  relard  (LAR)  disper¬ 
sive  avani  un  temps  de  propagation  de  groupe  T, 
dec.-ousani  linetiremenl  avec  la  frequence  f  suivani 
I'espression 


Tt 


To  ■+•  {Jo  "/) 


T_ 

a/ 


(avec  70  >  T) 


(voir  fig  4 j 


Fig  4 


telle  ligne  k  retard  esl  donnfe  par  : 


J  Et  cede  phase  est  bien  I’opposi  de  16(f), 

|  i  un  diphasage  constant  pris  (sans  importance) 
et  k  un  retard  To  pris  (inivitable). 

Un  signal  utile  S(l)  traversant  un  tel  filire  adaptd 
donne  a  la  sortie  (i  un  retard  T0  pris  et  k  un  dipha- 
sage  pris  de  la  porteuse)  un  signal  dont  la  transformie 
de  Fourier  eit  rielle,  constante  entre  /„  et  J0  + if, 
et  nulle  de  part  et  d 'autre  de  /0  et  de  /„  +  if,  e'eat- 
i-dire  un  signal  de  friquence  porteuse  /0  +  A/72  et 
dont  I'enveloppe  a  la  forme  indiquie  k  la  figure  S, 
ou  Ton  a  repristnti  simultaniment  le  signal  S(i) 
et  le  signal  5,(r)  correspondant  obtenu  a  la  sortie 
du  filtre  adaptd.  On  comprend  le  nom  de  rtcepteur 
k  compression  d'impulsion  donnt  k  ce  genre  de 
filtre  adaptd  :  la  «  largeur  »  (i  3  dB)  du  signal  com- 
primi  itant  igale  i  1/Af,  le  rapport  de  compression 

i  de  _Z_  _  7A/ 
i  1/A/ 


On  saisil  physiquement  le  phinomine  de  com¬ 
pression  en  rialisant  que  lorsque  le  signal  S(i)  eotre 
dans  la  ligne  k  retard  (LAR)  la  friquence  qui  entre 
la  premiire  k  I’instant  0  est  la  friquence  baase  /„, 
qui  met  un  temps  Ta  pour  traverser.  La  friquence  / 

entre  k  I’instant  r  -  (J  -/0)  -ZL  et  elle  met  un  temp* 
A / 

To~(J -Jo)  —  pour  traverser,  ce  qui  la  fait  ressortir 

A / 

i  I'instant  T.  ieRtemrni  Ainsi  done,  le  tianal  Sfi  1 
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RECORD  LENGTH  1728 


Cdi  est  d'auiani  plus  valable  quc  T tf  tn  plus 
(rind.  A  act  tgxr d  la  figure  2  reprdjenle  la  vrsie  courbe 
donnsra  IttJ  )l  en  fonction  de/pour  les  valeurs  numd- 
riques  indiqudes  page  prtedderxe. 


Dans  ce  css.  le  fikre  sdaptd  pourra  ctrc  conjtitud. 
conformdmenl  i  la  figure  3,  par  la  cascade  : 

—  d’un  fibre  passe-bande  de  tranjferl  unit  pour 
/o  </<  fo  +  bf  et  de  transfert  quasi  nul  pour 
/</,«/>  /o+A/.  fibre  nemodifiant  pas  la  phase 
des  composants  le  craversam  : 


F«o  J 


—  fikre  suivi  d’une  ligne  a  retard  (LAR)  disper¬ 
sive  ayant  un  temps  de  propagation  de  groupe  T, 
ddcroissam  lir.dairemcnt  avec  la  frequence  f  suivant 
1' expression  : 


T«  -  T„ +(/.-/)  (avec  T,  >  T) 
A/ 


(voir  fig.  4). 


F«J  4 


idle  ligne  k  retard  est  donnde  par  : 


El  cette  phase  eat  bien  I’opposd  de  /dK/X 
&  un  ddphasage  constant  prts  (sans  importance) 
ct  i  an  retard  T,  prts  (indvkable). 

Un  signal  utile  S(t)  traversant  on  tel  fikre  adapt* 
do  one  i  la  sortie  (k  un  retard  T,  prts  ct  4  un  dd  pha¬ 
se  gt  prds  de  la  porteuse)  un  signal  dont  la  tranrformdc 
de  Fourier  est  rdetle.  constants  entre  ft  et  /0+A/. 
et  nulle  de  part  et  d 'autre  de/0  ct  de/0 +Af.  e'est- 
i-drre  un  signal  de  frdqucnce  porteuse  ft+A/12  ct 
dont  I'enveloppc  a  la  forme  indiqude  a  la  figure  5, 
oil  Ton  a  reprdaentd  simukandment  le  signal  S(t) 
et  le  signal  S,(t)  correspondant  obtenu  a  la  sortie 
du  fibre  adapt*.  On  comprend  le  nom  de  rdeepteur 
i  compression  d'impulsion  donnd  k  ct  genre  de 
fikre  adapt*  :  la  «  largeur  »  (i  3  dB)  du  signal  com- 
primd  dtant  dgale  i  l/Af,  le  rapport  de  compression 

est  de  _I_  -  TAJ 
I /A/ 


Fig.  i 


On  saisit  physiquement  le  phdnomdne  de  com- 
pression  en  rdallsant  que  lorsque  le  signal  SO)  entre 
dans  la  Jigne  k  retard  (LAR)  la  frdqucnce  qui  entre 
la  premidre  i  I 'instant  0  est  la  frdquence  bass* 
qui  met  un  temps  T0  pour  traveler.  La  frdquence  f 

entre  a  I’insunt  r«(/_/#)Z.  et  elle  met  un  temps 


—  Pt>ur  traveler,  ce  qui  la  fait  reasortir 
•  ^ 

a  I  instant  7V.  dealemnni  A  inti  done,  le  tisnal  SO) 
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>  STARTING  AT  PEL  • 


1  (APPROX.)  -  RECORD  LENGTH 


Cdt  e*  d'lutinl  plus  salable  que  T  A/  est  plus 
grand  A  cet  dgard  la  figure  2  reprdsentc  la  vraie  courbe 
donnark  \$(J)\  en  fonction  dr/ pour  les  valeurs  numd- 
riqutj  indiqudes  page  prdo6dente. 


Dam  ce  cas,  le  fikre  adapt  d  pourra  etre  conaitud, 
conformdmenl  8  la  figure  3.  par  la  cascade  : 

—  d'un  fikre  passe-bande  de  tramfert  unitd  pour 
/«  </</o  +  A/  et  de  transfert  quasi  nul  pour 
/</»«/>  /o  +  A  A  fikre  ne  modifiant  pas  la  phase 
des  composants  le  traversanf  ; 


—  fikre  sum  d'une  ligne  8  retard  (LAR)  disper¬ 
sive  ayant  un  temps  de  propagation  de  groupe  T„ 
ddcroissant  lindairement  avec  la  frdquence  f  suivant 
1 'expression  : 


T,  -  Tt  +(/«-/)  —  (avec  T„  >  T) 
d/ 


(votr  fig.  4). 


telle  ligne  8  retard  est  donnde  par  : 

O-  j't  T„d f 

Et  cette  phase  est  bien  1'opposd  de  l+(J\ 
k  un  ddphasagc  constant  prds  (sans  importance) 
et  k  un  retard  Tt  prds  (indvitable). 

Un  signal  utile  S(t )  traversant  un  tel  filtre  adaptd 
donnc  8  la  sortie  (8  un  retard  7",  prds  et  8  un  ddpha- 
sage  prds  de  la  porteuse)  un  signal  dont  la  trmnd'ormde 
de  Fourier  est  rdelle,  constante  entre  /0  et  /0 + A/ 
et  nulle  de  part  et  d 'autre  de/s  et  de  /e  +  d/,  c'est- 
8-dire  un  signal  de  frdquence  porteuse  /»+ 1/12  et 
dont  I'enve) oppe  a  la  forme  indiqude  8  la  figure  5, 
ot)  I’on  a  reprdsentd  timultandment  le  signal  5(r) 
et  le  signal  5,(r)  correspondant  obtcnu  8  la  sortie 
du  fikre  adaptd.  On  comprend  le  nom  de  rdcepteur 
8  compression  d'impulsion  donnd  8  ce  genre  de 
fikre  adaptd  :  la  «  largeur  >(8  3  dB)  du  signal  com- 
primd  dtant  dgale  8  I/A f,  le  rapport  de  compression 

est  de  -I—  m  T&f 

1/4 / 


On  saisit  physiqurment  le  phdnomdnc  de  com¬ 
pression  en  rdalisant  que  lorsquc  le  signal  S(i)  eotre 
dans  la  ligne  8  retard  (LAR)  la  frdquence  qui  entre 
la  premidre  8  I'instant  0  est  ta  frdquence  bassc  /,. 
qui  met  un  temps  T%  pour  traverser.  La  frdquence/ 

entre  8  I'instant  i  . (/-/,)  —  et  elie  met  un  temps 

A/ 

T 

Tt  -(/-/»)  —  pour  traverser,  ce  qui  la  fait  ressortir 
8  I’instant  T»  dealemrnt  Ainsi  done,  le  sirnal  S(t  1 
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If .si  am  r«  i .  .  14  ote . .  1114 
hot  <or;4C4i.2f _ 


>  STAHTIHC  IT  4  £  L  » 


1  IIMIOK.I 


SECOSO  LCNCTH  2141 


Cell  est  d’autant  plus  valablc  que  14/  e*  plus 
grand  A  cet  igard  la  figure  2  repr*sente  la  vraie  court* 
donnant  |^C/)|  en  fonction  de/pour  le*  valeurs  numi- 
riquei  indiqu*es  page  prteddente. 


Daru  ce  cu,  le  filtre  adapt*  pourra  it  re  constltui, 
conform* ment  k  la  figure  3.  par  la  eaieade  : 

—  d‘un  filtre  pasie-bande  de  tramfert  unit*  pour 
/s  </</0  +  A/  et  de  transfer!  quasi  nul  pour 
/  <  /«  et  /  >  /0  +A  /.  filtre  ne  modifiant  pas  la  phase 
des  composants  le  traversant  ; 


ho.  1 


—  filtre  suivi  d'une  ligne  a  retard  (LAR)  disper¬ 
sive  ayant  un  temps  de  propagation  de  groupe  T„ 
dicroissant  liniairement  avec  la  friquence  f  suivant 
I'expressioo  : 

r,-T,+(/0-/)-^  (avec  T,>T) 


(voir  fig.  4). 


telle  ligne  k  retard  est  donnie  par  : 


Et  cette  phase  est  bien  t'opposi  de  /d<A 
i  un  diphasage  constant  pria  (aans  importance) 
et  k  un  retard  T,  pris  (inevitable). 

Un  signal  utile  S(t)  traversant  un  tel  filtre  adapt* 
donne  k  la  sortie  (1  un  retard  T,  pris  et  k  un  dipha¬ 
sage  pris  de  la  porteuse)  un  signal  dont  la  tram  for  mie 
de  Fourier  eat  rieile.  comtante  entre /»  et  /o  +  AC. 
et  nulle  de  part  et  d’autre  de/0  et  de/0+Af.  e'est- 
k-dire  un  signal  de  frequence  porteuse  /0+  A/72  et 
dont  I'cnvcloppc  a  la  forme  indiquie  k  la  figure  3. 
ob  I'on  a  reprisenti  simultanimcnt  le  signal  S(i) 
et  le  signal  S,(t)  correspondent  obtenu  k  la  sortie 
du  filtre  adapt*.  On  comprcnd  le  nom  de  riccpteur 
i  compression  d'impulsion  donn*  k  a  genre  de 
filtre  adapt*  :  la  «  largeur  »  (k  3  dB)  du  signal  corn- 
prim*  *tant  igale  k  l/Af.  le  rapport  de  compression 

est  de  -I—  -  T&J 

l/A/’ 


On  aaiait  physiquement  le  phinomioc  de  com¬ 
pression  en  rialisant  que  lorsque  le  signal  S(i)  entre 
dans  la  ligne  i  retard  (LAR)  la  frequence  qui  entre 
la  premiire  k  I 'instant  0  est  la  friquence  basse  /„ 
qui  met  un  temps  T%  pour  traverser.  La  friquence  / 

entre  k  ('instant  r  —  (J—  ft)—  et  elle  met  un  temps 

7 

T»-(/ -ft,)  —  pour  traverser,  ce  qui  la  fait  ressortir 

A/ 

k  I'inBant  7V  iealemmi  Ainsi  Hone,  le  sienal  X(t\ 
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9  .  SS  AM  MOM .  .  t  7  DEC . . 
HOT  <OF  *<C3  .  .  20 


>  START  I  MG  AT  PEL 


RECORD  LENGTH  2048 


Celt  est  d'sutam  plus  valabte  que  T  A/  cat  plus 
grind  A  oet  *gard  la  figure  2  repr*sente  la  vraie  court* 
dormant  \4(J)\  en  fonction  de/pour  let  vuleurs  num<- 
riquea  indiqu*cs  page  prtctdente. 


(•'•'I 

j 

U«  »M 

A  A 

Dana  ce  cat,  le  fiJtre  adapt*  pourra  tt re  conxltu*. 
conform*ment  4  la  figure  j,  par  la  caacade  : 

—  d’un  fibre  patse-bande  de  transfert  unit*  pour 
/o  <  f  <  /o  +  A/  et  de  trantfert  quasi  nul  pour 
/</<>«/>  /o+A/,  filtre  ne modifiant  pat  la  phase 
des  compo tarns  le  traversant  ; 


—  filtre  suivi  d’une  ligne  a  retard  (LAR)  disper¬ 
sive  ayant  un  tempt  de  propagation  de  groupe  Tt 
dtaroiuant  liniairement  avec  la  frequence  f  suivant 
('expression  : 


Tt  -  T0+(f„-f)  —  (avec  T0  >  T) 
A / 


(voir  fig.  4). 


telle  ligne  4  retard  est  donn*t  par  : 

9” -2k  J'r.d/ 

Et  cette  phase  est  bicn  Poppos*  de  !HJ\ 
i  un  d*phaaagc  constant  prts  (sans  importance) 
et  4  un  retard  Tt  pr it  (inevitable). 

Un  signal  utile  S(l )  traversant  un  tel  filtre  adapt < 
donne  4  la  sortie  (l  un  retard  Tt  prts  et  4  un  d*pha- 
sage  pris  de  la  porteuse)  un  signal  dont  la  transforms 
de  Fourier  est  rfeUe,  constants  entre  /,  et  /0  + A/, 
et  nulle  de  part  et  d'autre  de  /„  et  de  ft  +  6f,  c'est- 
4-dire  un  signal  de  frequence  porteuse  /»+  A/72  et 
dont  Penveioppe  a  la  forme  indiqute  4  la  figure  J. 
oli  Pon  a  represent*  simultan*ment  le  signal  S{i) 
et  le  signal  S,(t)  correspondent  obtenu  k  la  sortie 
du  filtre  adapt*.  On  eomprcnd  le  nom  de  rtccpteur 
it  compression  d'impulsion  donn*  k  ce  genre  de 
filtre  adapt*  :  la  «  targeur  *(4  3  dB)  du  signal  corn- 
prim*  etant  *gale  4  I/A f,  le  rapport  de  compression 

est  de  _ZL  -  TAJ 

1/A f 


%  •  Up* 
Afsl  MM* 
T  iVm 


On  saisit  physiquement  le  ph*nom*nc  de  com¬ 
pression  en  r*alisant  que  lorsque  le  signal  S(i )  entre 
d*ni  U  ligne  k  retard  (LAR)  la  frequence  qui  entre 
la  premitre  4  Pinatant  0  est  la  frtquence  basse  /,. 
qui  met  un  temps  T#  pour  traverser.  La  frtquence  / 

entre  4  ('Instant  r  -(/-/„)  —  «  elle  met  un  temps 

A / 

Tt-(f-f» )  —  pour  traverser,  ce  qui  la  fait  ressortir 
A/ 

4  Pinstant  7".  *»«lemi-ni  Amti  done  le  sirnal  S(t\ 
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Cela  est  d'autant  plus  valable  que  T  A/  est  plus 
grind  A  cet  egard  la  figure  2  represente  la  vraie  courbe 
dcnnan:  0<J  !■  en  fonction  de/ pour  Its  valeurs  num*- 
nq-.es  mdiquees  page  precedente 


Dans  ce  cas,  le  filtre  adapt*  pourra  *tre  eonstitu*. 
conformement  4  la  figure  3.  par  la  cascade  : 

—  d'un  filire  passe-bande  de  transferl  unit*  pour 
/c  <  U/i-if  et  de  transfer!  quasi  nul  pour 
f  <  r0  ei  f  >  /0  +  A  /,  filire  ne  modifiant  pas  la  phase 
ues  composams  le  traversant  , 


Fic.  J 


—  filtre  suivi  d'une  ligne  4  retard  (LAR)  disper¬ 
sive  ayant  un  temps  de  propagation  de  groupe  f* 
dicroissant  lin*airement  avec  la  frequence  /  suivant 
I'eipression 


T. 


T9  +  U,-f) 


(avec  T0  >  T) 


(voir  fig  4) 


telle  ligne  A  retard  est  donnte  par 


Et  cette  phase  est  bien  I'oppos *  de  /^C f). 

4  un  d*phasage  constant  pr*s  (sans  importance) 
et  k  un  retard  T0  prit  (inevitable). 

Un  signal  utile  5(f)  traversant  un  tel  filtre  adapt* 
donne  k  la  sortie  (k  un  retard  T0  pr*s  et  k  un  dipha- 
sage  pr*s  de  la  porteuse)  un  signal  dont  la  transform* 
de  Fourier  est  r*elle,  constante  entre  f„  et  /0  +  AA 
et  nulle  de  part  et  d’autre  de /0  et  de/o  +  AA  c'eM* 
4-dire  un  signal  de  frequence  porteuse  /,  +  A/72  et 
dont  1'en veloppe  a  la  forme  indiqu*e  k  la  figure  3, 
ou  1'on  a  represent*  simultan<ment  le  signal  S(l) 
et  le  signal  S,(r)  correspondant  oblenu  i  la  sortie 
du  filtre  adapt*.  On  comprend  le  nom  de  r*cepteur 
*  compression  d'impulsion  donn*  4  ce  genre  de 
filtre  adapt*  :  la  «  largeur  »  (4  3  dB)  du  signal  corn- 
prim*  *tant  *gale  4  l/Af,  le  rapport  de  compression 

est  de  S-  m  TA/ 

1/A/ 


On  saisit  physiquement  le  ph*nom*ne  de  com¬ 
pression  en  r*alisant  que  lorsque  le  signal  5(f)  entre 
dans  la  ligne  4  retard  (LAR)  la  fr*quence  qui  entre 
la  premiire  4  I'instant  0  est  la  frequence  baise  /0, 
qui  met  un  temps  T0  pour  traverser.  La  frequence  / 

T 

entre  4  I’instant  t  .  (/-/„)  —  et  elle  met  un  temps 

A / 

7, _(/_/„)  Z~  pour  traverser,  ce  qui  la  fait  ressortir 
A / 

4  I'instant  7L  *e*lemrnr  Ainsi  done,  le  sienal  S(t\ 
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telle  ligne  *  retard  eit  donn*e  per  : 


Cat  est  d'auian1.  plus  valible  que  T  <y  est  plus 
grand  A  ce*.  egard  a  figure  2  represents  la  vraie  courbe 
doanert  ex.')  en  fonction  de/pour  les  valeurs  nume- 
nquei  mdiquees  page  precedentc. 


Dam  ce  caa.  le  filtre  adapt*  poum  itre  conslitue, 
conformdment  *  la  figure  3.  par  la  cascade  : 

—  d'un  fibre  passe-bande  de  tranjftrt  unite  pour 
/„  s.  /  s/0fi/  et  de  transfers  quasi  nul  pour 
f  <  fc  et  /  >  /0-t-A/,  filtre  ne  modifiant  pas  la  phase 
des  composants  le  traversant  ; 


Fkj  J 


—  fibre  suivi  d'une  ligne  k  retard  (LAR)  disper¬ 
sive  ayant  un  temps  de  propagation  de  groupe  Tt 
decrement  lirdeirement  avec  la  frequence  /  suivanl 
I'expression  : 


T, 


To+Uo-f) 


_r 

a/ 


(avec  r0  >  T) 


(voir  fig.  4). 


Et  cette  phase  est  bien  I’opposd  de  /dKLT). 
k  un  diphasage  constant  pr*s  (sans  importance) 
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5.0  CONCLUSIONS  AND  RECOMMENDATIONS 


5 . 1  Conclusions 

Table  5.1  lists  the  compression  ratios  of  Modified  READ  Code  II 
algorithm  as  stated  in  the  final  report  on  "Measurement  of  Compression 
of  the  Modified  READ  Code  II"  by  Delta  Information  Systems,  Inc.  for 
the  National  Communication  System  under  Contract  No  DCA100-80-C-0042 . 
Also  listed  are  the  Compression  ratio  for  the  pattern  recognition 
algorithm  at  reject  thresholds  of  three  and  four.  As  can  be  seen,  the 
pattern  recognition  algorithm  shows  a  significant  increase  in 
compression  ratios  over  the  MRCII  with  the  greatest  increases  at  the 
higher  resolutions  at  a  reject  threshold  of  four.  As  stated  in 
Section  4,  the  pattern  recognition  algorithm  did  allow  some  small 
symbol  substitutions  at  resolutions  of  200  and  240  but  these  had 
little  if  any  effect  on  the  content  of  the  document. 

5 . 2  Recommendations 

The  compression  ratios  achieved  by  the  pattern  recognition  algorithm 
are  dependent  on  the  symbol  matching  rate  and  the  Modified  READ  II 
coding  of  the  library  pattern.  Because  of  this  the  performance  of  the 
recognition  algorithm  should  be  evaluated  on  Document  images  of  varying 
print  quality;  additionally  the  current  evaluation  of  the  pattern 
recognition  algorithm  did  not  attempt  to  improve  the  algorithm’s 
read  rate  performance  or  thruput  performance.  It  is  suggested  that 
the  algorithm  be  evaluated  with  the  following  variations: 

1)  Increased  library  size  for  the  CCITT  Document  #7 
since  the  512  library  pattern  limit  caused  the 
retransmission  of  many  symbols. 

2)  Prestored  libraries. 
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CCITT 

IMAGE 

RESOLUTION 

COMP  RATIO 
MRC  II 

COMP  RATIO 
RECOG  THR4 

% 

DIFF 

COMP  RATIO 
RECOG  THR3 

% 

DIFF 

200 

30.57 

88.22 

188.6 

84.35 

175.9 

*  1 

240 

36 . 54 

108 . 86 

197.9 

102  •  29 

179.9 

ENGLISH 

300 

45 . 44 

142 • 50 

213.6 

128.60 

183 . 0 

LETTER 

400 

59 . 57 

183 .71 

208.4 

156.50 

162 . 7 

200 

17.61 

54 . 38 

208.8 

50-26 

185.4 

240 

21 .00 

66.51 

216.7 

60.75 

189 . 3 

FRENCH 
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25.91 

85 . 36 

229.4 

75 . 56 

191.6 

JOURNAL 

400 

34.55 

111.64 

223 . 1 

93.47 

170.5 

200 

7.59 

19.02 

150.6 

15.93 

109.8 

*7 

240 

9 .12 

22.90 

151 . 1 

19.08 

109.2 

K  AN  J I 

3  00 

11.43 

31  .12 

172.3 

25.61 

124.1 

400 

15.50 

41.11 

165 . 2 

33.10 

113.5 

COMPARISON  OF  MRC  II  AND  PATTERN  RECOGNITION 
COMPRESSION  RATIOS 

TABLE  5 . 1 
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3)  Reject  threshold  variations  as  a  function  of 
resolution . 

4)  Variable  library  screening  feature  difference 
counts  should  be  evaluated  for  different 
resolutions . 

5)  The  pattern  matching  and  error  PEL  evaluation 
are  very  time  consuming.  Currently  the  incoming 
pattern  and  library  pattern  are  compared  and 
evaluated  at  nine  positions.  Some  evaluation 

of  an  early  exit  should  the  current  compare 
give  a  sufficiently  low  error  count  (good  match) 
or  sufficiently  high  error  count  (no  match) 
shou Id  be  done . 
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SOURCE:  AT&T 

TITLE:  PATTERN  RECOGNITION  CODING  FOR  GROUP  4  FACSIMILE 


1 .  INTRODUCTION 

Facsimile  engineers  have  long  sought  better  coding  schemes  in  order 
to  improve  transmission  efficiency.  Since  1979  when  the  Modified 
Read  algorithm  was  standardized,  new  algorithms  have  been 
discovered  that  are  able  to  achieve  an  improvement  in  compression 
efficiency  by  as  much  as  a  factor  of  five.  These  new  algorithms 
typically  require  an  error  free  environment  which  is  for  the  first 
time  available  in  Group  4  facsimile.  Therefore,  it  is  timely  to 
consider  adopting  one  of  these  new  algorithms  as  an  optional  coding 
mode  for  Group  4  facsimile. 

AT&T  has  been  studying  facsimile  coding  extensively  and  has 
developed  a  new  algorithm  that  may  represent  an  advance  over 
previously  published  algorithms.  The  algorithm  is  described  in 
this  paper.  AT&T  urges  that  this  new  algorithm  be  adopted  as  an 
optional  coding  scheme  for  Group  4  facsimile. 


2.  PROPOSAL 
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Before  describing  the  new  algorithm,  it  is  useful  to  consider  via 
an  example  the  improvement  in  facsimile  compression  if  patterns  in 
an  image  are  recognized.  Consider  CCITT  test  document  4  which  is 
essentially  all  text.  The  Modified  Read  algorithm  can  achieve  a 
compression  ratio  of  about  7.4.  Next  consider  a  case  where 
document  4  is  directly  captured  as  key  strokes  and  is  encoded  via 
an  ordinary  8  bit  character  code.  The  second  procedure  would  yield 
an  equivalent  compression  ratio  of  105.5.  In  examining  the 
differences  between  the  two  approaches,  the  major  factor  is  that  in 
the  character  coding  case,  a  human  has  recognized  the  characters  in 
the  message.  This  allows  the  characters  to  be  encoded  in  a  very 
efficient  manner.  Traditional  facsimile  coding  exploits  only 
microscopic  properties  of  images.  In  the  case  of  the  Modified  Read 
Algorithm,  the  coder  only  looks  at  immediately  neighboring  pels. 

It  should  be  obvious  that  a  gain  in  coding  efficiency  is  possible 
if  a  coder  can  combine  a  set  of  microscopic  properties  in  order  to 
recognize  and  code  the  actual  macroscopic  structure  of  the  image. 

The  new  AT&T  coding  scheme  exploits  macroscopic  properties  of 
images.  Patterns  such  as  characters  or  line  segments  which  appear 
in  a  facsimile  image  several  times  are  described  and  transmitted 
only  once.  The  next  time  the  same  pattern  is  found,  a  special 
codeword  will  indicate  that  it  is  identical  to  a  previously 
transmitted  pattern. 

A  facsimile  image  is  examined  line  by  line.  When  a  black  pel  is 
located,  a  "pattern  isolator"  surrounds  connected  pels  in  order  to 
extract  the  entire  pattern  which  the  pel  is  a  member  of.  The 
pattern  is  decomposed  into  MxM  sub-blocks  called  symbols.  In  many 
cases,  a  symbol  will  be  an  isolated  pattern  that  entirely  fits 
within  the  MxM  block.  The  incoming  symbol  is  matched  with  already 
identified  symbols  which  are  stored  in  a  library.  If  a  match  is 
detected,  information  about  the  position  of  the  symbol  in  the  image 
and  its  location  in  the  library  is  coded.  If  no  match  is  found, 
the  incoming  symbol  is  added  to  the  symbol  library  which  is  assumed 
to  be  empty  at  the  beginning  of  the  coding  and  is  gradually  built 
up.  In  this  case,  an  accurate  description  of  the  symbol  and  its 
position  in  the  image  is  coded. 

A  block  diagram  of  the  coding  algorithm  is  presented  in  Figure  1. 
The  coding  algorithm  is  described  in  detail  in  Appendix  1. 

The  proposal  described  in  detail  in  Appendix  1  can  be  divided  into 
two  parts.  The  coding  part,  including  the  library  management, 
contains  the  rules  allowing  the  communication  between  coder  and 
decoder.  This  part  must  be  standardized.  The  isolation  and 
matching  part  are  presented  here  only  to  demonstrate  the 
feasibility  of  the  system.  Each  manufacturer  can,  within  the 
limits  imposed  by  the  coding  standardization,  develop  its  own 
proprietary  isolation  and  matching  algorithms. 
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3 .  RESULTS 

Appendix  2  compares  the  new  AT&T  algorithm  with  the  Modified  Read 
algorithm.  The  coding  efficiency  averaged  over  the  eight  CCITT 
documents  is  increased  by  nearly  a  factor  of  2  compared  to  the 
Modified  Read  algorithm.  The  increase  is  very  high  for  images  that 
have  high  character  content.  For  example,  the  coding  efficiency  of 
document  4  can  be  improved  by  nearly  a  factor  of  5.  An  increase  in 
coding  efficiency  by  a  factor  of  about  2  is  obtained  for  images 
which  have  a  mixture  of  graphics  and  text  (document  5),  signature 
and  text  (document  1)  and  Chinese  characters  (document  7).  For 
documents  which  contain  hand  drawn  text  and  hand  drawn  graphics, 
such  as  document  2  and  8,  a  slight  decrease  in  coding  efficiency  is 
observed. 

The  new  coding  algorithm  is  approximate  since  the  decoded  image  is 
not  identical  to  the  original.  The  distortions  are  very  small  as 
can  be  observed  by  looking  at  Figures  2  through  9  which  show  the 
eight  CCITT  documents  after  coding  and  subsequent  decoding.  There 
are  options  which  allow  increasing  the  fidelity  of  transmission  by 
tightening  the  matching.  However,  an  increase  in  image  fidelity 
generally  means  that  the  coding  efficiency  will  be  slightly 
decreased. 

Appendix  3  describes  how  the  coding  technique  can  be  extended  to 
different  document  sizes  and  resolutions.  Appendix  3  also 
discusses  those  cases  where  a  priori  knowledge  exists  about  the 
type  of  symbols  in  a  document.  This  knowledge  allows  the  use  of  a 
prestored  symbol  library.  This  situation  would  occur  in  at  least 
two  important  cases. 

•  The  case  where  a  multipage  document  is  being  coded.  The 
symbol  library  would  be  generated  on  the  first  page  using  the 
coding  algorithm  described  in  Appendix  1.  The  next  page  would 
be  coded  using  this  symbol  library  as  a  starting  point. 
Effectively  the  coder  would  view  the  document  as  a  single  page 
thousands  of  lines  long. 

•  The  case  where  the  type  font  is  known  allowing  the  use  of  a 
prestored  library. 

Experiments  with  CCITT  document  4  have  shown  that  the  compression 
ratio  can  be  increased  from  35.4  to  53.7  if  the  document  is  coded 
once  to  generate  a  library,  and  then  coded  a  second  time  with  that 
library  as  a  starting  point. 


4 .  PATENT  STATUS 

Use  of  the  proposed  AT&T  coding  scheme  may  require  a  patent  license 
obtainable  from  the  AT&T  Intellectual  Property  Matters 
Organization,  Greensboro,  North  Carolina,  USA.  It  is  the  policy  of 
AT&T  to  enter  into  licensing  agreements  on  reasonable  terms  for  the 
patents  that  it  holds. 


5.  CONCLUSION 


AT&T  urges  that  the  study  of  new  optional  Group  A  coding  schemes  be 
undertaken.  It  feels,  furthermore,  that  the  coding  scheme 

this  paper  would  be  an  excellent  choice  for  adoption 
as  a  new  CCITT  coding  standard. 


APPENDIX  I:  CODING  DESCRIPTION 


1  .  STSTEM  DESCRIPTION 

Figure  1  shows  in  block  diagram  form  the  AT&I  pattern  matching 
coding  algorithm.  The  algorithm  has  four  basic  components  which  are 
described  briefly  below.  Each  subcomponent  is  described  in  detail 
in  subsequent  sections. 

•  Pattern  Identification  Block  -  Image  data  is  fed  sequentially 
left  to  right,  line  by  line  into  this  block.  The  pattern 
identification  block  scans  through  the  raw  image  data  and 
determines  for  each  black  pel  if  the  pattern  it  is  contained 
in  is  either  a  symbol  or  a  nonsymbol.  A  symbol  is  defined  as 
a  set  of  black  pels  completely  surrounded  by  white  pels  such 
that  the  symbol  can  be  completely  contained  within  a  MxM 
region.  A  nonsymbol  results  when  the  pattern  will  not  fit 
within  the  MxM  region.  Therefore  a  nonsymbol  is  a  fraction  of 
a  black  region.  All  patterns  can  be  decomposed  into  symbols 
and  nonsymbols.  Symbol  extraction  is  nonambiguous  since  it 
can  only  be  done  one  way.  Non  symbol  decomposition  can  be 
done  in  many  ways.  The  AT&T  pattern  matching  algorithm 
defines  only  one  of  these  procedures  for  decomposition.  After, 
extraction,  all  symbols  and  non  symbols  are  processed  in  the 
same  way  and  hence  no  further  distinction  is  made  between  the 
two.  . 

The  position  in  the  image  of  each  symbol  is  recorded  and  the 
symbol  is  extracted  from  the  image  and  stored  for  subsequent 
processing.  Because  all  patterns  can  be  decomposed  into 
symbols  or  non  symbols  the  pattern  identifier  leaves  no 
residue.  Each  symbol  is  passed  through  a  feature  extractor 
that  subjects  it  to  a  sequence  of  metrics.  The  metric 
description  or  feature  set  of  the  symbol  is  used  for 
subsequent  symbol  identification. 

•  Library  Management  Block  -  A  symbol  library  stores  the  symbols 
that  are  detected  in  the  course  of  coding.  Initially  the 
library  is  empty.  New  symbols  and  their  features  are  added  to 
the  library  by  an  update  and  management  unit.  This  unit 
stores  symbols  rank  ordered  by  the  frequency  of  their 
occurrence.  Infrequently  used  symbols  are  deleted  from  the 
library  as  needed  to  make  room  for  new  symbols. 

•  Symbol  Matching  Block  -  Only  library  symbols  which  have  a 
feature  set  close  to  the  newly  identified  symbol  are  fed  to 
the  symbol  matching  block.  The  candidate  library  symbols  are 
template  matched  against  the  new  symbol.  If  a  correct  match 
is  detected,  the  symbol  matching  block  outputs  the  library 
identification  of  the  symbol  as  well  as  the  position  of  the 
symbol  in  the  image.  If  no  match  is  detected,  the  symbol 
matching  block  outputs  a  description  of  the  symbol  as  well  as 
its  location.  The  new  symbol  will  be  added  to  the  library  by 
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the  symbol  library  block. 

•  Coder  Block  -  The  coder  receives  either  a  library 

identification  or  a  pel  by  pel  description  of  each  new 
incoming  symbol.  It  also  receives  information  about  where  the 
symbol  is  located  within  the  image.  The  coder  stores  all 
symbols  until  the  end  of  each  scan  line.  The  symbols  are 
sorted  by  order  of  occurrence  so  that  the  updated  library  will 
be  closely  matched  statistically  to  the  symbols  in  the  scan 
line.  Finally  for  each  symbol  the  pel  map  or  library 
identification  is  coded  and  the  coded  bits  are  output  for 
subsequent  transmission  or  storage. 

The  coder  described  in  this  appendix  assumes  that  the  width  of  the 
image  is  less  than  1792  pels.  This  condition  typicall-y  occurs  for 
an  a4  document  scanned  at  200  pels  per  inch.  However.  1792  is  not  a 
fundamental  restriction.  Appendix  3  shows  how  the  coding  algorithm 
can  be  extended  to  images  of  arbitrary  size  and  resolution.  Coder 
overhead  is  reduced  by  choosing  the  smallest  possible  value  for  the 
maximum  page  width. 

A  32x32  pattern  window  was  chosen  for  use  with  200  ppi  images 
because  it  can  enclose  most  textual  symbols.  A  larger  pattern 
window  could  be  used  but  it  would  increase  coder  complexity. 
Appendix  3  discusses  the  use  of  a  larger  pattern  window, 
particularly  for  higher  resolution  images. 

Experience  has  shown  that  a  512  entry  library  is  efficient  for  use 
with  a  32x32  pattern  window,  fewer  entries  could  be  used  but  this 
would  require  constant  retransmission  of  symbols.  A  larger  library 
would  involve  increased  overhead. 

The  window  size  and  maximum  page  width  should  be  established 
between  the  coder  and  the  decoder  via  protocol. 

The  first  and  last  rows  and  columns  of  the  image  are  assumed  to  be 
white.  This  allows  simplified  processing  of  patterns  at  the  edge  of 
the  image.  It  also  avoids  the  use  of  fictitious  pels  outside  the 
physical  image  area.  There  will  be  no  noticeable  reduction  in 
image  quality  if  these  pels  are  set  to  white. 


2.  THE  PATTERN  IDENTIFICATION  BLOCK 
2.1  Location  and  Isolation  of  Symbols 

A  pattern  is  defined  as  a  set  of  connected  black  pels.  A  pel  in  a 
pattern  must  connect  to  at  least  one  of  its  eight  neighbors.  The 
maximum  symbol  size  recognized  by  the  coder  is  32x32  pels.  Any 
connected  set  of  black  pels  within  the  32x32  block  will  be  isolated 
as  a  symbol.  Upon  identification,  the  symbol  will  be  extracted 
from  the  image  and  erased  from  the  image. 

Isolated  symbols  containing  only  1  or  2  black  pels  are  simply 
erased.  Any  pattern  bigger  than  the  32x32  window  will  be 
decomposed  into  several  symbols,  using  the  following  partition 


rule : 


-  7  - 


a.  the  starting  pel  which  is  found  by  sequentially  processing 
the  image  line  by  line  left  to  right  is  always  included. 

b.  All  the  connected  black  pels  on  the  right  side  or  under  the 
starting  pel  are  included  as  long  as  they  are  contained  in 
the  32x32  pel  window  whose  upper  left  pel  is  the  starting 
pel . 

c.  As  many  connected  black  pels  as  possible  within  the  32x32  pel 
window  size  are  added  to  the  left  of  the  starting  pel  without 
loosing  the  pels  stored  by  procedures  a  and  b  above. 

d.  The  identified  symbol  is  extracted  and  erased  from  the 
picture.  The  next  symbol  is  isolated  using  rules  a),  b)  and 
c)  until  the  entire  pattern  is  removed  from  the  image. 

This  isolation  scheme  is  modified  to  improve  performance  by  the 
elimination  of  some  L  shaped  patterns  that  appear  when  isolating  a 
big  black  region.  If  beginning  with  the  starting  or  second  pel  the 
isolated  pattern  has  a  vertical  edge  of  10  or  more  pels,  and  then  a 
horizontal  edge  to  the  right,  the  lower  side  of  the  window  is 
raised  so  that  the  horizontal  edge  will  be  just  excluded. 

The  position  of  each  isolated  symbol  is  recorded.  The  location  of 
the  symbol  is  taken  to  be  the  upper  left  pel  in  the  window. 

2.2  Feature  Extractor 

The  feature  extractor  classifies  each  new  symbol  so  that  it  can  be 
quickly  compared  to  symbols  stored  in  the  library.  Features  are 
determined  by  measuring  a  new  symbol  against  a  set  of  metrics.  The 
following  four  features  are  used: 


1 . 

length  of 

symbol  (feature  1) 

2. 

height  of 

symbol  (feature  2) 

3. 

number  of 
(feature 

horizontal  white  runs  included  in  the  symbol 

3) 

A. 

number  of 
A) 

vertical  white  runs  included  in  the  symbol  (feature 

1)  and  2)  are  self-explanatory.  3)  and  A)  are  the  number  of  white 
runs  followed  and  preceded  by  black  pels. 


3.  MATCHING 

The  matching  is  divided  into  three  parts:  1 )Screening  which  makes  a 
selection  of  the  library  symbols  and  directs  only  matching 
candidates  to  the  template  matcher.  2)Template  matching  which 
creates  a  new  binary  picture  called  an  error  picture  containing 
black  pels  or  "1"  in  the  locations  where  the  two  template  matched 
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symbols  are  dissimilar.  3)Matching  decision  process,  which  uses 
the  error  pictures  and  other  information  to  decide  whether  a 
correct  match  has  occurred. 

3.1  Screening 

The  purpose  of  the  screen  is  to  expedite  the  matching  process.  The 
screen  only  directs  to  the  template  matcher  library  symbols  which 
have  a  chance  of  matching  the  incoming  (unknown)  symbol.  The 
screen  decides  the  order  in  which  they  are  to  be  sent  to  the 
matcher.  The  most  likely  candidate  for  a  match  is  sent  first.  The 
probability  of  a  match  of  symbols  depends  not  only  on  the 
similarity  of  their  features,  but  also  on  the  probability  of 
occurrence  of  the  library  symbols.  For  example,  an  incoming  symbol 
having  the  same  feature  distance  to  a  0  and  a  Q  is  much  more  likely 
to  match  the  0  than  the  Q  since  0  is  much  more  frequent  than  Q. 

The  probability  of  occurrence  of  a  symbol  is  taken  into  account  by 
sorting  the  library  symbols  according  to  the  number  of  times  they 
have  matched.  The  sorting  procedure  is  described  in  section  U. 3. 
The  feature  distance  is  taken  into  account  by  allowing  for  each 
feature  only  a  fixed  margin  between  the  two  symbols. 

A  two-pass  screen  has  been  found  to  be  very  efficient.  In  the 
first  pass,  a  very  tight  screen  is  applied.  A  second  much  looser 
screen  is  applied  only  in  the  few  cases  where  no  match  occurred. 

Let  be  the  feature  value  of  the  incoming  symbol  for  feature  i 
and  let  be  the  same  for  a  library  symbol.  The  tight  screen  is 
defined  by: 
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The  loose  screen  is  used  only  if  the  new  symbol  cannot  match  any  of 
those  library  symbols  which  were  identified  as  possible  matches  by 
the  tight  screening  process.  The  loose  screen  is  defined  by: 

I  Fi ~ |  <_3  and 

| ?2~T2 I-3  and 

|  F 3 -  F ^  |  _<1 1  and 

I 


If  no  match  is  found  even  after  loose  screening,  the  coder 
concludes  that  no  match  is  possible.  The  symbol  is  subsequently 
added  to  the  symbol  library. 
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3.2  Template  Hatching 

The  template  matcher  creates  a  new  picture  called  an  error  picture 
which  contains  "1"  in  the  locations  where  the  new  symbol  differs 
from  a  symbol  stored  in  the  library.  It  is  obtained  simply  by 
performing  a  pel  by  pel  exclusive  OR  between  the  two  symbols. 

Figure  10  is  an  example  of  where  two  symbols  which  represents  the 
same  character  are  matched.  Figure  11  shows  the  matching  of  two 
different  symbols.  As  seen  from  Figure  10,  many  elements  can  be 
in  error  even  if  two  of  the  same  symbols  are  matched. 

A  total  of  nine  template  matches  are  made  between  two  symbols.  One 
of  them  is  obtained  when  the  upper  left  pel  of  the  two  blocks 
containing  the  symbols  are  superimposed  while  the  eight  others  are 
obtained  by  moving  one  of  the  symbols  horizontally  or  vertically  by 
one  pel  or  both  vertically  and  horizontally  by  one  pel  in  all  the 
possible  directions. 

3.3  Hatching  Decision 

The  decision  about  a  match  is  made  by  looking  at  the  neighborhood 
of  each  error  pel  in  the  error  picture.  The  decision  is  based  on  a 
local  rejection  rule.  At  each  error  pel,  a  rejection  test  is  made. 
An  "error  weight"  is  defined  as  the  number  of  adjacent  neighbors  an 
error  pel  has.  The  error  weight  of  a  pel  can  vary  from  0  to  8 
depending  on  the  neighboring  error  pels.  A  match  is  rejected  if: 

a.  an  error  pel  has  an  error  weight  of  A  or  more,  or 

b.  an  error  pel  has  an  error  weight  of  2  or  more  and 

1.  at  least  two  of  its  neighboring  error  pels  are  not 
connected  and 

2.  one  of  the  two  pels  from  the  symbols  used  to  obtain  the 
error  pel  has  a  corresponding  "pel  weight"  of  0  or  8 
(corresponding  to  0  or  8  surrounding  black  pels). 

A  correct  match  is  considered  to  have  occurred  if  the  whole  error 
picture  can  be  processed  without  rejection. 

Since  nine  template  matches  are  made  between  two  symbols,  there  are 
at  times  two  or  more  alignments  that  lead  to  a  correct  match.  In 
that  case,  the  position  which  has  the  lowest  count  of  the  sum  of 
the  error  weights  is  chosen. 


A.  CODING  AND  LIBRAKT  UPDATING 

A.1  Coding  of  the  Position  of  a  Symbol 

The  first  codeword  for  a  symbol  is  its  position.  A  special 
position  codeword  111  indicates  that  there  are  no  more  symbols  on 
the  line.  Any  codeword  not  starting  with  111  can  be  used  to 
indicate  the  position  of  a  symbol.  The  horizontal  and  vertical 
positions  of  each  symbol  are  coded.  The  reference  point  is  upper 
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left  corner  of  the  block  that  contains  the  symbol. 

The  absolute  horizontal  position  of  each  symbol  is  coded  by  eleven 
bit  two's  complement  binary.  Variable  length  run-length  coding  is 
not  justified  since  the  distance  between  the  symbols  and  the  edge 
of  the  document  is  typically  long. 

The  symbols  can  be  transmitted  in  a  nonsequential  order. 

Reordering  has  been  found  to  lead  to  a  significant  decrease  in  the 
average  code  length  for  the  library  identif ication  codewords. 

with  1792  pels/line,  an  11  bit  binary  codeword  can  code  the 
horizontal  position  of  a  symbol  without  using  any  codeword  that 
starts  with  111. 

The  vertical  position  of  a  symbol  is  coded  in  the  following  way: 

a.  A  mode  bit  at  the  beginning  of  each  line  indicates  whether 
there  are  any  symbols  on  the  line.  The  mode  bit  is  0  if  any 
symbol  is  detected  on  the  line.  It  is  1  if  no  symbols  are 
detected. 

b.  The  symbols  on  a  line  are  coded  and  are  followed  by  the 
codeword  111  which  indicates  that  there  are  no  more  symbols 
on  the  line. 

c.  When  a  symbol  is  replaced  by  a  library  symbol,  the  position 
of  the  library  symbol  might  move  up  or  down  one  position. 
Therefore,  after  the  library  identification  has  been  coded, 
the  codewords  10  and  11  are  used  to  position  the  library 
symbol  up  or  down  respectively.  The  codeword  0  is  used  to 
indicate  no  vertical  displacement.  The  vertical  displacement 
codeword  is  not  sent  with  a  new  library  symbol. 

Figure  12  shows  examples  of  the  message  format  for  the  symbol 
positioning. 

^<.2  Coding  of  the  Library  Symbol  Description 

A  symbol  must  fit  within  a  32x32  pel  block.  The  description  of  the 
symbol  starts  with  a  5  bit  binary  word  which  indicates  the  height  H 
of  a  pattern.  The  length  of  a  pattern  is  extended  to  32  pels  if 
necessary  by  appending  0's  to  the  right  end.  Therefore,  there  are 
32xH  pels  to  code.  For  coding  efficiently,  one  white  pel  ("0")  is 
added  at  the  beginning. 

A  coding  line  is  made  from  the  32xH+1  pels  which  are  aligned  in 
raster  scan  order.  The  reference  line  is  similar  to  the  coded  line 
except  that  all  the  pels  are  shifted  to  the  right  by  32  pels  (one 
line).  Therefore,  the  pel  on  the  coding  line  has  the  same  column 
coordinate  as  the  pel  on  the  line  above  has  on  the  reference  line. 
The  line  is  then  coded  by  the  CCITT  Modified  Read  Algorithm,  with 
the  only  modification  that  the  first  code  word,  which  is  always  the 
horizontal  mode  code  word,  is  deleted  since  it  is  not  necessary. 

For  coding  efficiency,  switching  is  allowed  between  two  modes  for 
the  coding  of  the  library  symbol  description.  The  first  mode  is 
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the  described  above  called  "horizontal  coding."  The  other  is  called 
"vertical  coding"  and  is  the  same  as  above  except  that  the  synod 
is  coded  column  after  column  from  top  to  bottom. 

A  header  bit  indicates  which  mode  is  chosen  with  a  "0"  for 
horizontal  mode  and  a  "1"  for  vertical  mode.  It  is  followed  by  a  5 
bit  word  which  indicates  the  length  of  a  symbol. 

A. 3  Coding  of  the  Symbol  Identification 

A  codeword  is  sent  for  each  new  symbol  to  indicate  which  library 
symbol  has  produced  a  match.  If  the  symbol  number  is  coded  in 
two's  complement  binary,  9  bits  would  be  required  for  a  library 
with  512  symbols. 

The  coding  procedure  described  here  leads  to  an  average  coding 
length  of  less  than  5  bits/symbol.  This  result  is  obtained  by  a 
continuous  library  updating  and  variable  length  coding. 

4.3. 1  Library  Updating  and  Management  Library  management  and 
updating  is  done  for  the  following  purposes: 

a.  Accept  new  library  symbols,  and,  if  necessary,  delete  a 
seldom  used  library  symbol  to  make  place  for  the  new  one. 

b.  Organize  the  library  for  the  fastest  possible  match. 

c.  Organize  the  library  for  minimum  average  library 
identification  coding  length. 

All  three  purposes  require  keeping  track  of  the  number  of  times 
each  library  symbol  is  used.  A  correct  match  can  be  obtained 
rapidly  if  the  library  symbols  are  ordered  in  decreasing  usage. 

This  way  the  most  used  library  symbols  will  be  accessed  first.  An 
efficient  coding  of  the  symbols  identification  is  obtained  by 
giving  short  codewords  to  the  first  symbols  in  the  list.  The  last 
symbol  in  the  list,  which  is  one  of  the  least  used  symbols,  can  be 
deleted  to  make  place  for  a  new  symbol. 

The  updating  rule  for  the  symbols  in  the  library  is  as  follows: 

a.  When  a  symbol  matches  a  library  symbol  number  K,  that  library 
symbol  is  moved  to  number  K/2  and  all  the  symbol  numbers  from 
K/2  to  K-1  are  increased  by  1. 

b.  When  a  new  symbol  is  added,  it  gets  number  N/2  where  N  is  the 
number  of  library  symbols.  The  symbols  with  numbers  from  N/2 
to  N  will  be  increased  by  1,  and  if  necessary,  the  last 
library  symbol  is  dropped. 

This  updating  procedure  has  been  found  to  be  efficient  in  giving 
low  identification  numbers  to  often  used  symbols. 

The  updating  and  coding  is  only  done  at  the  end  of  each  scan  line. 

A  library  symbol  will  be  moved  only  once  on  a  line,  even  if  it 
matches  several  incoming  symbols  on  the  line. 
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4.3.2  Symbol  Identification  Coding  Table  The  symbol 
identification  code  table  includes  two  special  codewords:  new 
symbol  and  same  symbol.  The  "new  symbol"  codeword  makes  it 
unnecessary  to  send  an  identification  number  for  a  new  library 
symbol.  The  "same  symbol"  codeword  indicates  that  the  transmitted 
symbol  is  the  same  as  the  previously  transmitted  symbol.  It  is 
useful  particularly  for  typewritten  text  where  the  line  by  line 
search  for  a  symbol  often  detects  the  same  symbols  (character  on  a 
line) . 

The  code  table  for  the  symbol  identification  is  given  in  Table  1. 

Experience  has  shown  that  without  sorting  this  code  leads  to  an 
average  library  identification  code  length  of  less  than  7  compared 
to  9  that  obtains  for  a  fixed  length  code.  These  results  are  based 
on  the  8  CCITT  documents.  Sorting  which  is  described  in  the  next 
section  further  reduces  the  code  length. 

4.3.3  Symbol  Identification  Coding  and  Sorting  The  absolute  code 
for  the  horizontal  position  of  a  symbol  allows  symbols  detected 
along  a  line  to  be  transmitted  in  any  order.  A  precondition  is 
that  the  library  updating  must  be  done  only  at  the  end  of  the  line. 
By  sorting  the  symbols  on  a  line  according  to  their  library  number, 
the  average  coding  length  of  the  library  identification  has  been 
found  to  be  less  than  5.  This  obtains  because:  1)  there  are  many 
more  identical  symbols;  2)  the  library  symbol  is  run  length  coded 
and  3)  the  new  library  symbols  are  sent  at  the  end.  Therefore,  the 
new  symbol  codeword  is  sent  only  once  on  a  line  since  if  there  are 
more  symbols,  they  are  automatically  new  symbols. 

This  can  be  illustrated  by  an  example:  let  a  line  have  the 
following  symbols:  symbol  23,  new  symbol,  symbol  28,  same,  symbol 
23,  new  symbol. 

By  looking  at  Table  1,  the  coding  length  is  7  +  5  +  7  + 

3  ♦  7  *  5  m  34  bits.  With  sorting,  the  symbols  become: 
symbol  23,  same,  symbol  28,  same,  new  symbol,  new  symbol.  The 
coding  length  is7+3+5+3*5+0*23  bits. 
It  should  be  noted  in  this  example  that  symbol  28  is  coded  as 
symbol  5  since  only  the  increase  in  ID  number  compared  to  the 
previous  symbol  is  coded. 

4.4  Coding  Summary 

The  coding  procedure  can  be  summarized  in  the  following  way: 

1.  All  the  symbols  isolated  a’ong  a  line  are  matched  (see 
sections  2  and  3). 

2.  At  the  end  of  the  line,  the  matched  symbol  are  sorted  in 
order  of  increasing  symbol  identification  number.  The  new 
library  symbols  are  added  at  the  end  in  sequential  order  (see 
section  4.3.1). 

3.  The  symbols  are  coded  with  the  information  sent  in  the 
following  order: 


Vj.i. 


a. 


-  13  - 


Horizontal  position  of  symbol  (see  section  4.1). 

b.  Symbol  identification.  If  it  is  a  new  symbol,  the 
identification  is  sent  only  for  the  first  new  symbol  on 
the  line  (see  section  4.3.2). 

c.  A  1  or  2  codeword  bits  to  specify  the  vertical  shift  of 
a  symbol  except  if  it  is  a  new  library  symbol  (see 
section  4.1). 

d.  For  a  new  library  symbol  the  following  information  is 
sent.  i.see  also  section  4.2). 

•  A  header  bit  indicating  whether  the  horizontal  or 
vertical  coding  mode  is  chosen. 

•  A  5  bit  word  indicating  the  number  of  lines  of  the 
symbol  to  be  coded. 

•  CCITT  Modified  Read  coding  of  the  symbol 

e.  After  all  symbols  on  a  line  have  been  sent,  the  special 
horizontal  codeword  111  indicates  the  end  of  the  line 
(see  section  4.1). 

f.  The  library  update  is  made  according  to  the  updating 
rule  in  4.3.1.  The  symbols  are  updated  in  order  of 
increasing  ID  number.  After  updating,  all  symbols  with 
number  greater  than  480  are  deleted,  thus  allowing  for 
at  least  32  new  library  symbols  to  be  added  on  next 
line.  In  the  rare  cases  where  more  than  32  new  symbols 
are  encountered  on  a  line,  the  library  will  overflow. 
The .problem  is  resolved  by  introducing  an  artificial 
new  line.  The  portion  of  the  existing  line  that  has 
already  been  coded  is  terminated  in  the  usual  way.  The 
library  will  be  flushed  to  480  symbols.  The  artificial 
new  line  will  start  with  a  special  horizontal  codeword 
111  which  is  an  indication  that  it  is  an  artificial 
line.  Symbols  will  be  added  to  the  library  again  as 
needed. 

Figure  13  gives  an  example  of  message  transmission.  Table  2 
summarizes  the  different  codewords  that  are  sent. 


5 .  DECODER 

A  pattern  matching  decoder  is  inherently  simpler  than  a  pattern 
matching  encoder.  In  principle,  all  the  decoder  has  to  do  is  to 
reproduce  symbols  on  the  decoded  image  plane.  The  source  of  the 
symbols  is  either  the  symbol  library  when  a  symbol  identification 
number  is  detected  in  the  code  stream,  or  the  pel  description  of  a 
new  symbol  which  is  embedded  in  the  code  stream.  The  decoder  is 
completely  independent  of  the  matching  algorithm  that  is  used  in 
the  encoder. 
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Our  experience  has  shown  that  image  quality  is  often  improved  by 
slightly  modifying  certain  decoded  symbols.  If  a  symbol  is  30  pels 
or  more  high  or  wide,  the  symbol  is  expanded  by  two  pel  on  the  high 
or  wide  side.  This  is  accomplished  by  doubling  the  row  or  column 
three  pels  away  from  each  border.  This  modification  corrects  small 
distortions  which  can  occur  in  large  pattern  regions  of  the  image. 
In  addition,  a  local  3  by  3  postprocessing  is  found  useful  in 
eliminating  the  artifacts  in  large  black  regions.  The 
postprocessor  replaces  a  white  pel  by  a  black  pel  if  the  columns  to 
its  left  and  to  its  right  are  black,  or  the  row  above  and  below  are 
black. 

Experience  has  also  shown  that  the  image  quality  of  decoded  images 
can  be  often  improved  by  additional  post  processing  operations. 

The  decoder  may  produce  small  edge  discontinuities  where  it  is 
matching  up  symbols  to  form  a  large  pattern.  While  any  individual 
error  is  small,  the  human  eye  is  very  sensitive  to  the  resulting 
jagged  edges.  The  edge  discontinuity  can  usually  be  reduced  by 
post  processing.  Post  processing  algorithms  are  well  known  in  the 
facsimile  literature  and  are  not  an  inherent  part  of  the  decoder. 
Therefore  the  choice  of  a  post  processor  is  left  as  a 
manufacturer's  option. 
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APPENDIX  2:  RESULTS 

This  appendix  gives  results  of  the  coding  simulations.  We  used  for 
our  simulations  a  digitized  version  of  the  8  CCITT  documents  which 
was  supplied  by  the  French  administration.  The  page  format  was 
1728  x  2376  pels. 


1 .  FACSIMILE  qualitt 

The  binary  picture  is  modified  slightly  by  the  coder.  An 
acceptable  coder  must  not  produce  alterations  that  are  annoying  or 
noticeably  visible. 

There  are  three  types  of  alterations  that  can  result  from  pattern 
coding:  wrong  matches,  matches  with  a  slightly  distorted  pattern 
and  wrong  positioning.  In  the  case  of  a  wrong  match,  a  pattern  is 
replaced  by  a  different  pattern.  The  only  wrong  matches  we  found 
were  confusions  between  0  and  0,  dot  and  comma,  I  and  1.  Even  human 
observers  have  difficulty  recognizing  these  patterns  correctly  out 
of  context.  Therefore,  it  can  be  considered  that  the  system  has 
practically  no  wrong  matches. 

A  match  with  a  slightly  distorted  pattern  can  occur  with 
characters.  A  character  might  match  a  different  font  version  of 
the  same  character.  Alternatively,  a  character  might  match  a 
thinned  version  of  the  same  or  thickened  character.  Distorted 
matches,  contrary  to  wrong  matches,  are  tolerable  if  they  don't 
appear  too  often.  They  commonly  appear  when  two  slightly  different 
fonts  are  used  on  a  page  or  when  the  digitized  characters  come  from 
a  low  quality  typewriter  or  scanner. 

Wrong  positioning  of  a  pattern  decreases  received  copy  quality.  We 
observed  no  wrong  positioning  for  characters  or  other  symbols. 

Some  wrong  positioning  has  been  observed  for  non  symbol  patterns 
such  as  line  segments,  where  the  successive  patterns  make  the  lines 
slightly  jagged. 

Figures  2  through  9  show  the  CCITT  facsimile  images  after  pattern 
matching  coding  and  subsequent  decoding.  It  can  be  seen  that  there 
are  no  significant  degradations.  Some  slight  irregular ities  in 
line  drawings  are  observed,  as  for  example  in  CCITT  document  5 
(Figure  6).  A  few  "distored  matches"  appear  on  CCITT  document  1 
(Figure  2). 


2.  COMPRESSION 

Table  3  gives  the  coding  lengths  for  the  CCITT  documents  which 
result  from  the  AT&T  which  pattern  matching  algorithm  and  from  the 
Modified  Read  Code.  Table  4  gives  the  transmission  time  at  64 
kbits/s.  Table  5  gives  the  compression  ratio  for  the  eight 
documents.  The  results  were  obtained  without  the  use  of  any 
synchronization  or  stuffing  bits. 
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On  the  average,  the  transmission  time  is  reduced  by  47  percent 
compared  to  the  Modified  Read  Code.  The  decrease  in  transmission 
time  compared  to  the  Modified  Read  Algorithm  is  variable.  For 
documents  containing  mostly  handwritten  drawings  and  text,  such  as 
documents  2  and  8,  there  can  be  a  slight  increase  in  the  number  of 
code  bits.  This  obtains  because  there  are  few  matching  patterns. 
For  example,  for  document  2  there  are  716  patterns,  but  459  of  them 
are  library  patterns.  For  documents  containing  mostly  text,  such 
as  document  4,  the  transmission  time  is  reduced  nearly  by  a  factor 
5.  For  documents  containing  a  mixture  of  text  and  drawings,  the 
transmission  time  is  reduced  by  30  to  45  percent.  For  document  7, 
the  decrease  is  smaller  than  for  regular  printed  text  because  the 
number  of  Chinese  characters  and  hence  symbols  far  exceeds  the 
number  of  symbols  that  would  be  found  on  a  typed  letter.  However, 
the  AT&T  pattern  matching  algorithm  still  doubled  the  compression' 
ratio. 
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APPENDIX  3:  EXTENSIONS 

This  appendix  shows  how  the  proposed  pattern  matching  coding 
technique  can  be  extended  to  pages  wider  than  1792  pels  and 
resolutions  higher  than  200  pels/inch.  It  also  shows  how  the  coder 
can  control  the  amount  of  distortion  that  will  occur  during  the 
coding  process.  Finally,  coding  efficiency  can  in  many  cases  be 
improved  if  a  prestored  library  can  be  used  to  start  the  coding 
process . 


1.  EXTENSIONS  TO  DOCUMENTS  WIDER  THAN  1792  PELS 

The  restriction  that  the  suffix  code  111  be  reserved  when  coding 
the  horizontal  position  using  eleven  bit  two's  complement  binary 
restricts  the  page  width  to  1792  pels.  For  larger  documents,  the 
11  bit  codeword  is  replaced  by  a  longer  one.  For  example,  a  12  or 
13  bit  horizontal  code  allows  lines  as  wide  as  3584  and  7168  pels 
respectively. 

The  maximum  width  of  the  document  being  coded  would  be  made  Known 
to  the  decoder  via  protocol. 


2.  EXTENSIONS  TO  HIGHER  RESOLUTION  DOCUMENTS 

The  relative  advantage  of  the  AT&T  pattern  matching  coding 
algorithm  over  the  standard  Modified  Read  algorithm  increases  with 
increased  resolution.  The  number  of  code  bits  produced  by  the 
Modified  Read  algorithm  increases  approximately  linearly  with 
increased  resolution.  Pattern  matching  generates  code  which 
consists  of  two  components.  One  part  is  the  symbol  description 
which  is  based  on  Modified  Read  code  and  therefore  generates  code 
which  increases  linearly  with  resolution.  The  other  part  is  the 
symbol  identification  and  position  information  that  is  only  weakly 
dependent  on  resolution.  The  net  effect  is  that  the  number  of  code 
bits  generated  by  the  pattern  matching  algorithm  will  increase  much 
less  with  increased  resolution  than  would  the  code  bits  generated 
by  the  Modified  Read  algorithm. 

The  extension  of  the  pattern  matching  algorithm  to  higher 
resolution  requires  a  few  changes  in  the  coding  algorithm  to  work 
efficiently.  One  change  comes  about  because  the  line  lengths  are 
typically  longer  at  higher  resolution.  The  solution  to  this  problem 
was  treated  above.  Another  change  comes  about  because  it  is 
advantageous  to  increase  the  block  size  to  fit  the  size  of  the 
pattern.  It  follows  that  a  block  size  of  48x48  would  be  effective 
for  300  pels/inch.  Similarly,  a  block  size  of  64x64  would  be  good 
for  400  pels/inch.  The  codeword  used  to  indicate  the  size  of  a 
symbol  for  a  block  size  of  up  to  64x64  would  need  to  be  increased 
to  6  bits.  Larger  block  sizes  would  result  in  correspondingly 
larger  symbol  size  description  codes. 

The  coder  would  let  the  decoder  know  the  maximum  block  size  via 
protocol . 
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3.  CONTROL  OVER  CODER  DISTORTION 

There  might  be  some  cases  where  coding  with  practically  no 
distortion  is  preferred.  It  is  possible  to  improve  the  image 
quality  by  tightening  the  matching  criteria.  In  that  case  the 
coder  simply  tightens  the  rejection  criteria  when  it  compares  an 
unknown  symbol  with  a  symbol  in  the  library.  For  example,  the 
coder  can  reject  any  match  with  weight  3  or  greater  instead  of  of  A 
or  greater  as  was  proposed  in  Appendix  1 . 

The  matching  criteria  is  local  to  the  coder.  Therefore  the  coder 
does  not  need  to  inform  the  decoder  about  the  details  of  the 
matching  criteria. 


A.  PRESTORED  LIBRARIES 

Considerable  improvement  in  coding  efficiency  is  possible  if  the 
coder  starts  out  coding  a  new  page  from  a  prestored  code  library. 
This  reduces  the  number  of  patterns  that  need  to  be  sent  to  the 
decoder  to  fill  up  the  decoder's  library.  Prestored  libraries  are 
not  practical  for  arbitrary  images  because  the  symbols  in  the 
library  need  to  be  closely  matched  to  image  content. 

Prestored  libraries  are  useful  in  two  cases.  The  first  is  where  the 
character  font  is  known  a  priori  by  the  encoder.  In  this  case,  the 
starting  code  library  would  also  be  known  to  the  encoder.  This 
library  would  be  communicated  to  the  decoder  via  protocol.  A 
special  case  is  where  the  decoder  also  has  prestored  libraries  for 
a  variety  of  different  fonts.  In  that  case,  only  the  font  style 
need  be  corresponded  to  the  decoder. 

A  second  case  obtains  for  multi-page  documents.  In  that  situation, 
it  is  usually  best  to  code  the  next  page  starting  with  the  library 
developed  at  the  conclusion  of  coding  the  last  page.  This 
situation  would  also  be  communicated  to  the  decoder  via  protocol. 
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Symbol 

Codeword 

Codeword 

length 

same  symbol 

000 

3 

library  symbol 

1-16 

1XXXX 

5 

new  symbol 

00100 

5 

library  symbol 

17-32 

010XXXX 

7 

library  symbol 

33-64 

001 1XXXXX 

9 

library  symbol 

65-128 

001 01XXXXXX 

1 1 

library  symbol 

129-512 

01 1XXXXXXXXX 

12 

Table  1:  Coding  Table  for  the  identification  of  the  library  symbols 
where  XXXX  is  coded  as  Two's  Complement  Binary  (see 
section  4.3.2). 
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Table  2:  Description  of  the  Codewords  for  Pattern  Matching  Coding 
(Maximum  Page  Width  1792  pels) 


Code  Definition 

Uord  Size 

Description 

Mode  bit 

1 

Indicate  whether  there  are  any 
symbols  on  the  line.  The  mode 
bit  is  0  if  any  symbols  are 
detected,  1  if  none  are  (see 
section  4.1). 

Horizontal  position 

1 1 

Gives  in  two's  complement 
binary  the  absolute  position 
of  a  symbol  (see  section  4.1), 

No  more  symbol 

3 

111  Indicates  that  there  are 
no  more  symbols  on  the  line 
(this  codeword:  111  is  a 
special  horizontal  position 
codeword)  (see  section  4.1). 

Vertical  move  of 
symbol 

1  or  2 

Indicates  whether  the  symbol 
must  be  moved  up  (codeword  10) 
or  down  (codeword  11)  by  one 
line  or  is  not  moved  at  all 
(codeword  0)  (see  section 

4.1). 

Library 

identification  code 

variable 

Define  which  library  symbol  is 
coded  (see  Table  1  for 
coding) . 

Library  symbol 
description  header 

1 

Indicate  whether  the  library 
symbol  is  coded  in  horizontal 
(header  0)  or  vertical  (header 
1)  mode  (see  section  4.2). 

Library  symbol  size 

5 

Gives  in  two's  complement 
binary  the  number  of  lines 
which  must  be  coded  by  the 
library  symbol  description 
code  (see  section  4.2). 

Library  symbol 
description 


variable 


Slightly  modified  CCITT 
Modified  Read  code  (see 
section  A. 2). 
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Document 

Pattern  Matching 
Code 

Modified  Read 
Code 

Reduction  in 
number  of 
bits  coded 

CCITT1 

65426 

144816 

54.8*4 

CCITT2 

89219 

86416 

-3.3*4 

CCITT3 

158424 

229639 

31.02 

CCITT4 

116140 

554186 

79.02 

CCITT5 

'141107 

257767 

45.32 

CC1TT6 

105982 

133197 

20.42 

CCITT7 

269070 

554247 

51 .52 

CCITT8 

182509 

152786 

-19.52 

Total 

1127877 

2133052 

47.12 

Table  3:  Comparison  of  number  of  bit  coded. 


Document 

Pattern  Matching 
Code 

Modified  Read 
Code 

CCITT1 

1.02s 

2.26s 

CCITT2 

1.39s 

1.35s 

CCITT3 

2.48s 

3.59s 

CCITT4 

1.81s 

8.66s 

CCITT5 

2.20s 

4.03s 

CCITT6 

1.66s 

2.08s 

CCITT7 

4.20s 

8.66s 

CCITT8 

2.85s 

2.39s 

Total 

17.6s 

33.0s 

Table  4:  Comparison  of  transmission  times  at  64  Kbits/s. 
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Document 

Pattern  Matching 
Code 

Modified  Read 
Code 

CCITT1 

62.75 

35.28 

CCITT2 

46.02 

47.51 

CCITT3 

25.92 

17.88 

CCITT4 

35.35 

7.41 

CCITT5 

29.10 

15.93 

CCITT6 

38.74 

30.82 

CCITT7 

15.26 

7.41 

CCITT8 

22.50 

26.87 

Table  5:  Comparison  of  Compression  Ratios 
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THE  SLEREXE  COMPANY  LIMITED 

SAPORS  LAME  -  BOOLE  -  DORSET  -  BH  25  8  ER 
tblhthone  BOOLE  (945  13  )  51617  .  telex  123456 


Our  Ref.  350/PJC/EAC 


18th  January,  1972. 


Dr.  P.N.  Cundall, 
Kining  Survey*  Ltd-, 
Bolroyd  Road, 
Reading, 

Berks. 


Dear  Pete, 

Permit  me  to  introduce  you  to  the  facility  of  facsimile 
transmission. 

In  facsiadle  a  photocell  is  caused  to  perform  a  raster  scan  over 
the  subject  copy.  The  variations  of  print  density  on  the  document 
cause  the  photocell  to  generate  an  analogous  electrical  video  signal. 
This  signal  is  used  to  modulate  a  carrier,  which  i6  transmitted  to  a 
remote  destination  over  a  radio  or  cable  communications  link. 

At  the  remote  terminal,  demodulation  reconstructs  the  video 
signal,  which  is  used  to  modulate  the  density  of  print  produced  by  a 
printing  device.  This  device  is  scanning  in  a  raster  6can  synchronised 
with  that  at  the  transmitting  terminal.  As  a  result,  a  facsimile 
copy  of  the  subject  document  is  produced. 

Probably  you  have  use6  for  this  facility  in  your  organisation. 

Yours  sincerely, 

fU. 

P.J.  CROSS 

Croup  Leader  -  Facsimile  Research 
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Document  I  after  pattern  matching  coding  and  decoding. 
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Document  3  after  pattern  matching  coding  and  decoding. 


L'ordrede  lancement  et  de  realisation  des  applications  fait  l'objet  de  decisions  au  plus  haut 
niveau  de  la  Direction  Generale  des  Telecommunications.  Q  n'est  certes  pas  question  de 
construlre  ce  systems  integre  "«n  bloc"  mats  blen  au  contralre  de  proceder  par  etapes.  par 
palier6  successifs.  Certaines  applications,  dont  la  rentabilite  ne  pourra  etre  assuree,  ne 
seront  pas  entreprises.  Actuellement.  sur  trente  applications  qui  ont  pu  etre  globalement 
definies,  sixen  sont  au  stade  de  l'exploitatlon,  six  autres  se  sont  vu  donner  la  priorlte  pour 
leur  realisation. 

Cheque  application  est  confiee  4  un  "chef  de  projet",  responsable  successivement  de  sa 
conception,  de  son  analyse-programmation  et  de  sa  mise  en  oeuvre  dans  une  region-pilote. 
La  generalisation  ulterieure  de  l'application  realisee  dans  cctte  region-pilote  depend  des 
resultats  obtenus  et  fait  l'objet  d'une  decision  de  la  Direction  Generale.  Neanmoins,  le 
chef  de  projet  dolt  des  le  depart  conslderer  que  son  activlte  a  une  vocation  nationale  done 
refuser  tout  partlcularlsme  regional.  II  est  aide  d'une  equipc  d'analystes-programmeurs 
et  entoure  d'un  "groupe  de  conception"  charge  de  rediger  le  document  de  "definition  de6 
objectifs  globaux"  puls  le  "cahier  des  charges"  de  l'appllcatlon,  qui  sont  adre6se6  pour  avi6 
1  tous  lea  services  utilisateurs  potentiels  et  aux  chefs  de  projet  des  autres  applications. 
Le  groupe  de  conception  comprend  6  a  10  personnes  representant  les  services  les  plus 
divers  concernes  par  le  projet, et  comporte  obligatolrement  un  bon  analyste  attache  4  l'ap* 
plication. 

n  -  L'IMPLANTATION  GEOGRAPHIQUE  D'UN  RESEAU  INKORMATIQUE  PERFORMANT 

L'organisatlon  de  l'entreprise  fran^aise  des  telecommunications  repose  sur  l'existence  de 
20  regions.  Des  calculateurs  ont  ete  lmplantes  dans  le  passe  au  molns  dans  toutes  les  plus 
lmportantes.  Ontrouve  alnsl  des  machines  Bull  Gamma  30  4  Lyon  et  Marseille,  des  GE  425 
&  Lille.  Bordeaux,  Toulouse  et  Montpellier,  unGE  437  4  Massy,  enfin  quelques  machines 
Bull  300  TI  4  programmes  cables  etalent  recemment  ou  sont  encore  en  service  dans  les 
regions  de  Nancy.  Nantes,  Limoges,  Poitiers  et  Rouen  ;  ce  pare  e6t  es6entlellement  utilise 
pour  la  comptabtllte  teiephonique. 

A  l'avenir,  alia  plupart  des  fichiers  necessaires  aux  applications  ddcrites  plus  haut  peuvent 
etre  geres  en  temps  differe,  un  certain  nombre  d'entre  eux  devront  necessairement  etre  ac- 
cessibles,  voire  mis  a  Jour  en  temps  reel  :  parmi  ces  demiers  le  fichier  commercial  de6 
abonnes,  le  fichier  des  rerrselgnements,  le  fichier  des  circuits,  le  fichier  technique  des 
abannes  contlendront  des  quantites  considerables  d'informatione. 

Le  volume  total  de  caractfet'es  a  gerer  en  phase  finale  sur  un  ordinateur  ayant  en  charge 
quelques  500  000  abonnes  a  ete  estlme  a  un  milliard  de  caractferes  au  moins.  Au  moins  le 
tiers  des  donnees  seront  concernees  par  des  traitements  en  temps  reel. 

Aucun  des  calculateurs  enumeres  plus  haut  ne  permettait  d'envisager  de  tels  traitements. 
L'integration  progressive  de  toutes  les  applications  suppose  la  creation  d'un  support  commun 
pour  toutes  les  informations,  une  veritable  "Banque  de  donnees",  repartie  sur  des  moyens 
detraltement  natlonaux  et  regionaux,  et  qui  devra  rester  alimentee,  mlse  a  jour  en  perma¬ 
nence,  a  partir  de  la  base  de  l'entreprise,  e'est-a-dire  les  chantiers,  les  magaslns,  les 
gulchets  des  services  d'abonnement,  les  services  de  personnel  etc. 

L'dtude  des  differents  fichiers  a  constituer  a  done  permis  de  definlr  les  prlnclpales  carac- 
ttristiques  du  r€seau  d'ordlnateurs  nouveaux  a  mettre  en  place  pour  aborder  la  realisation 
du  systems  informa tif.  L'obllgation  de  falre  appel  a  des  ordinateurs  de  troisl£me  generation, 
trfcs  puis sants  et  dotes  de  volumineuses  memolres  de  masse,  a  conduit  a  en  reduire  substan- 
tiellement  le  nombre. 

L'lmplantatlon  de  sept  centres  de  calcul  interreglonaux  constituera  un  compromis  entre  : 
d'une  part  le  desir  de  reduire  le  coot  economique  de  l'ensemble.  de  faciliter  La  coordination 
des  equipes  d'informaticlens;  et  d'autre  part  le  refus  de  erder  des  centres  trop  importants 
difficiles  a  gerer  et  a  dlriger.et  posant  des  probldmes  delicate  de  securite.  Le  regroupc- 
ment  des  traitements  relatifs  a  plusleurs  regions  sur  chacun  de  ces  sept  centres  permettra 
de  leur  donner  une  taille  relativement  homogene.  Chaque  centre  "gerera"  environ  un  mil- 

Figure  5:  Document  4  after  pattern  matching  coding  and  decoding. 


telle  ligne  A  retard  eat  donnAe  par  : 


Ota  eat  d’autant  plus  valabte  que  T bf  eat  plus 
grand.  A  cet  Agard  la  figure  2  reprtaente  la  vrak  courbe 
doooaat  *®  Tooction  dc / pour  Ics  vaJeura  num6- 

riquea  indiquAe*  page  prAoAdeate. 


1  1*  ,|> 

TJf  mm 

‘tSt* 

2  ' 

a  V*a« 

Fn.  2 


Dam  oe  cat,  le  IDtre  adaptA  pourra  Atre  constituA, 
coaformAment  i  la  flgare  i,  par  la  cascade  : 

—  d’un  flhre  pano-bande  de  transfert  unRA  pour 
/0  </</«  + A/  et  de  tranaTert  quasi  nul  pour 
/  </o  et/> /0  +  A/i  fUlre  ne modi ftant pas  la  phase 
des  compoaants  le  traveraant  ; 


Pta  ) 


—  flhre  uiivi  d'une  ligne  i  retard  (LAR)  disper¬ 
sive  ayaat  un  tempt  de  propagation  de  froupe  T« 
dAcroissant  litkaireroent  avec  la  frequence  /  suivmnt 
l’ctpretaioa  : 

r.-r*+ (/;-/)—  (avec  r,  >  r> 


(voir  fig.  4), 


Et  cette  phase  est  bien  PopposA  de  !MJ\ 
k  un  dAphasage  constant  prAs  (sans  imporuncc) 
et  A  un  retard  T0  pr is  (inAvitabk). 

Un  signal  utfle  S(t)  traversant  un  tel  filtre  adapts 
donne  A  la  sortk  (A  un  retard  7"0  prAs  et  A  un  dApha- 
sage  pets  de  la  porteuse)  un  signal  dont  la  transformAc 
de  Fourier  eat  rAeUe,  constante  entre  /0  et  /„  +  A/, 
et  nulk  de  part  et  d'autre  de  /0  et  de  /g  +  A/,  e’est- 
A-dire  un  signal  de  frequence  porteuse  /0 + A/72  et 
dont  Penvdoppe  a  la  forme  indiquAe  A  la  figure  5, 
ou  I'on  a  reprAsentA  simulunAment  k  signal  S(l) 
et  le  signal  St(l)  correspondant  obtenu  A  la  sortie 
du  flhre  adaplA.  On  comprend  k  nom  de  rAcepteur 
A  compression  d’impulsion  donnA  A  ce  genre  de 
filtre  adaptA  :  la  «  largeur  »  (A  3  dB)  du  signal  com- 
primA  Aunt  Agale  A  I /AT,  le  rapport  dc  compression 
T 

eat  de  —1—  -  rtf 
1/Af 


On  saisit  physiquement  k  phAnomAne  de  com¬ 
pression  en  rAalisant  que  lorsque  le  signal  S(i)  entre 
dans  la  ligne  A  reurd  (LAR)  la  frAquence  qui  entre 
la  premkre  A  P  instant  0  est  la  frAquence  basse 

pattern  matching  coding  and  decoding. 
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Figure  7:  Document  6  after  pattern  matching  coding  and  decoding. 
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Figure  8:  Document  7  after  pattern  matching  coding  and  decoding. 
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Document  8  after  pattern  matching  coding  and  decoding. 
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FIGURE  10  TEMPLATE  MATCHING  OF  TWO  SIMILAR  SYMBOLS  WITH 
al  AND  6)  ORIGINAL  SYMBOL  AND  c)  ERROR  PICTURE 


FIGURE  11  TEMPLATE  MATCHING  OF  TWO  DIFFERENT  SYMBOLS  WITH 
a)  AND  bl  ORIGINAL  SYMBOL  AHO  tl  ERROR  PICTURE. 


FIGURE  12  ILLUSTRATION  OF  THE  CODING  OF  THE  POSITIONS 
OF  SYMBOLS  IN  THIS  EXAMPLE,  TWO  LINES 
HAVE  NO  SYMBOLS,  THEN  A  LINE  HAS  THREE 
SYM80LS;  THE  FIRST  ON  POSITION  231  IS 
REPLACED  BY  A  LIBRARY  SYMBOL,  THE  SECOND 
ON  POSITION  1532  IS  A  NEW  LIBRARY  SYMBOL. 
THERE  ARE  NO  SYMBOLS  ON  NEXT  LINE. 
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HORIZONTAL  POSITION  OF  FIRST  LIBRARY  SO  CODE  VERTICAL 
SYMBOL:  LIBRARY  COOE  FOR  836  FOR  SYMBOL  23  MOVE  OF 

SYMBOL 
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HORIZONTAL  POSITION  OF  SECONO  SAME  I  HORIZONTAL  POSITION  OF  LIBRARY  ID  VERTICAL 

symbol  (POSITION  1436)  SYMBOL  I  third  SYMBOL  (POSITION  416)  FOR  SYMBOL  move  of 

VERTICAL  28  (IS  RUN  SYMBOL 

HOVE  of  LENGTH  CODEO  as 

SYMBOL  S(28-Z3  =  S) 
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HORIZONTAL  POSITION  OF  FOURTH  SAME  VERTICAL  HORIZONTAL  POSITION  OF  NEW  SYMBOL 
SYMBOL  (POSfTlON  1211)  SYMBOL  MOVE  OF  FIFTH  STMBOL  (POSITION  249) 
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FIGURE  13  EXAMPLE  OF  MESSAGE  TRANSMISSION.  IN  THIS 
EXAMPLE,  THERE  ARE  TWO  LINES  WITHOUT 
SYMBOLS  NEXT  SYMBOL  23  IS  IN  POSITION 
936,  THE  SAME  SYMBOL  IS  IN  POSITION  1436, 
SYMBOL  28  IS  IN  POSITION  416,  SAME 
SYMBOL  IN  POSITION  1231;  THERE  ARE  TWO 
NEW  SYMBOLS  ON  POSITION  249  AND  998, 

AND  THERE  ARE  NO  SYMBOLS  ON  NEXT  LINE. 
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